I have an html text in my database with many img tags. My goal is remove img tags with specific src
我的数据库中有一个带有许多img标签的html文本。我的目标是用特定的src去除img标签
My Input is
我的输入
<div>
<p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
<p>some other text goes here <img height="100" src='/upload/remove-me.png' width="200" /></p>
<p>some other text goes here <img src="/upload/filename.png" /></p>
</div>
I'd like to remove all images where src="/upload/remove-me.png" my output result to be
我想删除src="/上载/远程-me的所有图片。我的输出结果是
<div>
<p>some text goes here</p>
<p>some other text goes here</p>
<p>some other text goes here <img src="/upload/filename.png" /></p>
</div>
Is there any way to do it with regex in TSQL?
有没有办法用TSQL中的regex来做呢?
4 个解决方案
#1
1
From your example it seems the tags can have their attributes in any order, so we need to loop through the text to take out the img tags one at a time. Obviously you will want to try this on a backed up version of your data to make sure it is only removing what you want to be removed:
从您的示例中可以看出,标记的属性可以是任意顺序的,因此我们需要对文本进行循环,以一次取出一个img标记。显然,您需要在备份的数据版本上尝试此操作,以确保只删除您希望删除的内容:
declare @HTML table(a nvarchar(max))
insert into @HTML
select
'<div>
<p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
<p>some other text goes here <img height="100" src="/upload/remove-me.png" width="200" /></p>
<p>some other text goes here <img src="/upload/filename.png" /></p>
</div>'
declare @URL nvarchar(50) = 'src="/upload/remove-me.png"' -- Search for img tags with this text in.
declare @TagStart int = -1
declare @TagEnd int = -1
while @TagStart <> 0
begin
select @TagStart = patindex('%<img%' + @URL + '%/>%',a)-1 -- Find the start of the first img tag in the text.
,@TagEnd = patindex('%/>%'
,substring(a
,patindex('%<img%' + @URL + '%/>%',a)
,999999999
)
)+1 -- Find the end of the first img tag in the text.
from @HTML
update @HTML -- Update the table to remove just this tag
set a = (select left(a,@TagStart) + right(a,len(a)-@TagStart-@TagEnd)
from @HTML
)
select @TagStart = patindex('%<img%' + @URL + '%/>%',a) -- Check if there are any more img tags with the URL to remove. Will return 0 if there are none.
from @HTML
end
select a as CleanHTML
from @HTML
#2
2
XML DML gives more elegant solution. Most probably your main table has HTML field as (n)varchar(max))
so a temporary table is necessary.
XML DML提供了更优雅的解决方案。最可能的情况是,您的主表的HTML字段为(n)varchar(max),因此需要一个临时表。
declare @HTML table(id int, a xml)
insert into @HTML
select id, html
from dbo.myTable
/* content of html field
'<div>
<p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
<p>some other text goes here <img height="100" src="/upload/remove-me.png" width="200" /></p>
<p>some other text goes here <img src="/upload/filename.png" /></p>
</div>'
*/
update @html
set a.modify('delete //img[contains(@src,"remove-me")]') --delete nodes and update
from @HTML cross apply a.nodes('div') t(v)
--select * from @html --just to see what happens
update dbo.myTable
set html = h.a
from dbo.myTable t
inner join @html h on t.id = h.id
#3
0
If the img
is constant as a whole (not just the src):
如果img作为一个整体是常数(不只是src):
<img height="100" src='/upload/remove-me.png' width="200" />
then you can use a simple REPLACE
, like this:
然后你可以用一个简单的替换,比如:
UPDATE tablename SET columnname=REPLACE(
columnname,
N' <img height="100" src=''/upload/remove-me.png'' width="200" />',
N''
)
WHERE columnname LIKE N'% <img height="100" src=''/upload/remove-me.png'' width="200" />%'
The space before the tag is intended. If the markup is stored in an ntext
column, convert to nvarchar(max)
first, otherwise REPLACE
will fail.
标签前的空格。如果标记存储在ntext列中,首先转换为nvarchar(max),否则替换将失败。
If this is a task other than a one-off data correction, you should rather include that with your business logic layer.
如果这是一项任务,而不是一次性的数据修正,您应该将其包含到业务逻辑层中。
#4
0
The following function should do the job. It simply finds the image start and end tags for the targeted image name and then removes the text.
下面的函数应该完成这项工作。它只是为目标图像名找到图像开始和结束标记,然后删除文本。
ALTER FUNCTION Html_RemoveImageAttributes
(
@sourceImage NVARCHAR(100),
@inputHtml NVARCHAR(MAX)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @imageTagStart INT = CHARINDEX('<img ' , @inputHtml, 1);
DECLARE @imageIndex INT = CHARINDEX(@sourceImage, @inputHtml, @imageTagStart);
DECLARE @imageTagEnd INT = CHARINDEX('/>' , @inputHtml, @imageTagStart);
DECLARE @outputHtml NVARCHAR(MAX) = @inputHtml;
WHILE (@imageIndex > 0)
BEGIN
IF (@imageIndex > @imageTagStart) AND (@imageIndex < @imageTagEnd)
BEGIN
-- Remove first occurrence of image.
SET @outputHtml = REPLACE(@outputHtml, SUBSTRING(@outputHtml, @imageTagStart, @imageTagEnd - @imageTagStart + 2), '');
SET @imageTagStart = CHARINDEX('<img ' , @outputHtml);
SET @imageIndex = CHARINDEX(@sourceImage, @outputHtml);
SET @imageTagEnd = CHARINDEX('/>' , @outputHtml);
END
ELSE
BEGIN
SET @imageTagStart = CHARINDEX('<img ' , @outputHtml, @imageTagEnd);
SET @imageIndex = CHARINDEX(@sourceImage, @outputHtml, @imageTagEnd);
SET @imageTagEnd = CHARINDEX('/>' , @outputHtml, @imageTagEnd + 1);
END
END
RETURN @outputHtml
END
The following example shows how it can be used:
下面的例子展示了如何使用它:
DECLARE @sourceImage NVARCHAR(50) = 'remove-me.png';
DECLARE @input NVARCHAR(4000) = N'<div>
<p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
<p>some other text goes here <img height="100" src=''/upload/remove-me.png'' width="200" /></p>
<p>some other text goes here <img src="/upload/filename.png" /></p>
</div>';
PRINT dbo.Html_RemoveImageAttributes(@sourceImage, @input);
#1
1
From your example it seems the tags can have their attributes in any order, so we need to loop through the text to take out the img tags one at a time. Obviously you will want to try this on a backed up version of your data to make sure it is only removing what you want to be removed:
从您的示例中可以看出,标记的属性可以是任意顺序的,因此我们需要对文本进行循环,以一次取出一个img标记。显然,您需要在备份的数据版本上尝试此操作,以确保只删除您希望删除的内容:
declare @HTML table(a nvarchar(max))
insert into @HTML
select
'<div>
<p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
<p>some other text goes here <img height="100" src="/upload/remove-me.png" width="200" /></p>
<p>some other text goes here <img src="/upload/filename.png" /></p>
</div>'
declare @URL nvarchar(50) = 'src="/upload/remove-me.png"' -- Search for img tags with this text in.
declare @TagStart int = -1
declare @TagEnd int = -1
while @TagStart <> 0
begin
select @TagStart = patindex('%<img%' + @URL + '%/>%',a)-1 -- Find the start of the first img tag in the text.
,@TagEnd = patindex('%/>%'
,substring(a
,patindex('%<img%' + @URL + '%/>%',a)
,999999999
)
)+1 -- Find the end of the first img tag in the text.
from @HTML
update @HTML -- Update the table to remove just this tag
set a = (select left(a,@TagStart) + right(a,len(a)-@TagStart-@TagEnd)
from @HTML
)
select @TagStart = patindex('%<img%' + @URL + '%/>%',a) -- Check if there are any more img tags with the URL to remove. Will return 0 if there are none.
from @HTML
end
select a as CleanHTML
from @HTML
#2
2
XML DML gives more elegant solution. Most probably your main table has HTML field as (n)varchar(max))
so a temporary table is necessary.
XML DML提供了更优雅的解决方案。最可能的情况是,您的主表的HTML字段为(n)varchar(max),因此需要一个临时表。
declare @HTML table(id int, a xml)
insert into @HTML
select id, html
from dbo.myTable
/* content of html field
'<div>
<p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
<p>some other text goes here <img height="100" src="/upload/remove-me.png" width="200" /></p>
<p>some other text goes here <img src="/upload/filename.png" /></p>
</div>'
*/
update @html
set a.modify('delete //img[contains(@src,"remove-me")]') --delete nodes and update
from @HTML cross apply a.nodes('div') t(v)
--select * from @html --just to see what happens
update dbo.myTable
set html = h.a
from dbo.myTable t
inner join @html h on t.id = h.id
#3
0
If the img
is constant as a whole (not just the src):
如果img作为一个整体是常数(不只是src):
<img height="100" src='/upload/remove-me.png' width="200" />
then you can use a simple REPLACE
, like this:
然后你可以用一个简单的替换,比如:
UPDATE tablename SET columnname=REPLACE(
columnname,
N' <img height="100" src=''/upload/remove-me.png'' width="200" />',
N''
)
WHERE columnname LIKE N'% <img height="100" src=''/upload/remove-me.png'' width="200" />%'
The space before the tag is intended. If the markup is stored in an ntext
column, convert to nvarchar(max)
first, otherwise REPLACE
will fail.
标签前的空格。如果标记存储在ntext列中,首先转换为nvarchar(max),否则替换将失败。
If this is a task other than a one-off data correction, you should rather include that with your business logic layer.
如果这是一项任务,而不是一次性的数据修正,您应该将其包含到业务逻辑层中。
#4
0
The following function should do the job. It simply finds the image start and end tags for the targeted image name and then removes the text.
下面的函数应该完成这项工作。它只是为目标图像名找到图像开始和结束标记,然后删除文本。
ALTER FUNCTION Html_RemoveImageAttributes
(
@sourceImage NVARCHAR(100),
@inputHtml NVARCHAR(MAX)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @imageTagStart INT = CHARINDEX('<img ' , @inputHtml, 1);
DECLARE @imageIndex INT = CHARINDEX(@sourceImage, @inputHtml, @imageTagStart);
DECLARE @imageTagEnd INT = CHARINDEX('/>' , @inputHtml, @imageTagStart);
DECLARE @outputHtml NVARCHAR(MAX) = @inputHtml;
WHILE (@imageIndex > 0)
BEGIN
IF (@imageIndex > @imageTagStart) AND (@imageIndex < @imageTagEnd)
BEGIN
-- Remove first occurrence of image.
SET @outputHtml = REPLACE(@outputHtml, SUBSTRING(@outputHtml, @imageTagStart, @imageTagEnd - @imageTagStart + 2), '');
SET @imageTagStart = CHARINDEX('<img ' , @outputHtml);
SET @imageIndex = CHARINDEX(@sourceImage, @outputHtml);
SET @imageTagEnd = CHARINDEX('/>' , @outputHtml);
END
ELSE
BEGIN
SET @imageTagStart = CHARINDEX('<img ' , @outputHtml, @imageTagEnd);
SET @imageIndex = CHARINDEX(@sourceImage, @outputHtml, @imageTagEnd);
SET @imageTagEnd = CHARINDEX('/>' , @outputHtml, @imageTagEnd + 1);
END
END
RETURN @outputHtml
END
The following example shows how it can be used:
下面的例子展示了如何使用它:
DECLARE @sourceImage NVARCHAR(50) = 'remove-me.png';
DECLARE @input NVARCHAR(4000) = N'<div>
<p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
<p>some other text goes here <img height="100" src=''/upload/remove-me.png'' width="200" /></p>
<p>some other text goes here <img src="/upload/filename.png" /></p>
</div>';
PRINT dbo.Html_RemoveImageAttributes(@sourceImage, @input);