I need to remove all HTML tags except:
我需要删除所有HTML标记,除了:
- it is
<sub>
tag - there is {1 (or more) newline(s) + 4 (or more) spaces} in the behind of it
- it is surrounded into "`" character.
它是标签
后面有{1(或更多)换行符+4(或更多)空格}
它被包围成“`”字符。
Here is an examples:
var str = "something1
<sub>
something2
<div class='myclass'>something3</div>
</sub>
<div class='myclass'>something4</div>
something5
<div class='myclass'>something6</div>
<div class='myclass'>something7</div>
`<div>something8</div>`
something9";
Expected output:
/*
something1
<sub>
something2
something3
</sub>
something4
something5
<div class='myclass'>something6</div>
`<div>something8</div>`
something9
Here is what I've tried so far:
这是我到目前为止所尝试的:
/\n\s{0,3}<.*[^>]+|<sub>.*?<\/sub>|`.*?`/gm
1 个解决方案
#1
0
This is possible with regex substitutions. Use this regex with mg
modifiers:
这可以通过正则表达式替换来实现。将此正则表达式与mg修饰符一起使用:
(\n\n .*|`[^`]+`|<\/?sub\b[^>]+>)|<[^>]+>
And use $1
as the substitution.
并使用$ 1作为替代。
There are several parts to this. The capturing group finds all the HTML you may want to keep:
这有几个部分。捕获组找到您可能想要保留的所有HTML:
-
\n\n .*
An empty line, and another line that starts with 4 spaces. -
`[^`]+`
Things inBack`Ticks
. -
<\/?sub\b[^>]+>)
This matchessub
HTML elements, opening or closing.
\ n \ n。*空行,以及以4个空格开头的另一行。
`[^`] +`回来的东西'滴答作响。
<\ /?sub \ b [^>] +>)这匹配子HTML元素,打开或关闭。
The remaining HTML elements will match <[^>]+>
, which is discarded.
其余的HTML元素将匹配<[^>] +>,将其丢弃。
#1
0
This is possible with regex substitutions. Use this regex with mg
modifiers:
这可以通过正则表达式替换来实现。将此正则表达式与mg修饰符一起使用:
(\n\n .*|`[^`]+`|<\/?sub\b[^>]+>)|<[^>]+>
And use $1
as the substitution.
并使用$ 1作为替代。
There are several parts to this. The capturing group finds all the HTML you may want to keep:
这有几个部分。捕获组找到您可能想要保留的所有HTML:
-
\n\n .*
An empty line, and another line that starts with 4 spaces. -
`[^`]+`
Things inBack`Ticks
. -
<\/?sub\b[^>]+>)
This matchessub
HTML elements, opening or closing.
\ n \ n。*空行,以及以4个空格开头的另一行。
`[^`] +`回来的东西'滴答作响。
<\ /?sub \ b [^>] +>)这匹配子HTML元素,打开或关闭。
The remaining HTML elements will match <[^>]+>
, which is discarded.
其余的HTML元素将匹配<[^>] +>,将其丢弃。