如何替换字符串中的奇怪图案？

I'm in the process of creating a temporary procedure in SQL because I have a value of a table which is written in markdown, so it appear as rendered HTML in the web browser (markdown to HTML conversion).

我正在使用SQL创建临时过程,因为我有一个以markdown编写的表的值,因此它在Web浏览器中显示为呈现的HTML(降级为HTML转换)。

String of the column currently look like this:

列的字符串当前如下所示:

Questions about **general computing hardware and software** are off-topic for Stack Overflow unless they directly involve tools used primarily for programming. You may be able to get help on [Super User](http://superuser.com/about)

I'm currently working with bold and italic text. This mean (in the case of bold text) I will need to replace odd N times the pattern**with<b>and even times with</b>.
I saw replace() but it perform the replacement on all the patterns of the string.

我目前正在使用粗体和斜体文本。这意味着(在粗体文本的情况下)我需要用替换模式**的奇数N次,用替换偶数次。我看到了replace()但它在字符串的所有模式上执行替换。

So How I can replace a sub-string only if it is odd or only it is even?

那么我怎样才能替换子字符串,只要它是奇数或只是偶数?

Update: Some peoples wonder what schemas I'm using so just take a look here.

更新:有些人想知道我正在使用什么模式,所以请看一下这里。

One more extra if you want: The markdown style hyperlink to html hyperlink doesn't look so simple.

如果你想要的话还有一个额外的东西:html超链接的降价样式超链接看起来并不那么简单。

4 个解决方案

#1

Using theSTUFFfunction and a simpleWHILEloop:

使用STARTFF功能和simpleWHILEloop:

CREATE FUNCTION dbo.fn_OddEvenReplace(@text nvarchar(500), 
                                      @textToReplace nvarchar(10), 
                                      @oddText nvarchar(10), 
                                      @evenText nvarchar(500))
RETURNS varchar(max)
AS
BEGIN
    DECLARE @counter tinyint
    SET @counter = 1

    DECLARE @switchText nvarchar(10)
    WHILE CHARINDEX(@textToReplace, @text, 1) > 0
    BEGIN
        SELECT @text = STUFF(@text, 
                    CHARINDEX(@textToReplace, @text, 1), 
                    LEN(@textToReplace), 
                    IIF(@counter%2=0,@evenText,@oddText)),
                @counter = @counter + 1
    END
    RETURN @text
END

And you can use it like this:

你可以像这样使用它:

SELECT dbo.fn_OddEvenReplace(column, '**', '<b>', '</b>')
FROM table

UPDATE:

This is re-written as an SP:

这被重写为SP:

CREATE PROC dbo.##sp_OddEvenReplace @text nvarchar(500), 
                                  @textToReplace nvarchar(10), 
                                  @oddText nvarchar(10), 
                                  @evenText nvarchar(10),
                                  @returnText nvarchar(500) output
AS
BEGIN
    DECLARE @counter tinyint
    SET @counter = 1

    DECLARE @switchText nvarchar(10)
    WHILE CHARINDEX(@textToReplace, @text, 1) > 0
    BEGIN
        SELECT @text = STUFF(@text, 
                    CHARINDEX(@textToReplace, @text, 1), 
                    LEN(@textToReplace), 
                    IIF(@counter%2=0,@evenText,@oddText)),
                @counter = @counter + 1
    END
    SET @returnText = @text
END
GO

And to execute:

并执行:

DECLARE @returnText nvarchar(500)
EXEC dbo.##sp_OddEvenReplace '**a** **b** **c**', '**', '<b>', '</b>', @returnText output

SELECT @returnText

#2

As per OP's request I have modified my earlier answer to perform as a temporary stored procedure. I have left my earlier answer as I believe the usage against a table of strings to be useful also.

根据OP的请求,我修改了我之前的答案,以执行临时存储过程。我已经离开了我之前的答案,因为我相信对字符串表的使用也是有用的。

If a Tally (or Numbers) table is known to already exist with at least 8000 values, then the marked section of the CTE can be omitted and the CTE reference tally replaced with the name of the existing Tally table.

如果已知Tally(或Numbers)表已存在至少8000个值,则可以省略CTE的标记部分,并将CTE引用标记替换为现有Tally表的名称。

create procedure #HtmlTagExpander(
     @InString   varchar(8000) 
    ,@OutString  varchar(8000)  output
)as 
begin
    declare @Delimiter  char(2) = '**';

    create table #t( 
         StartLocation  int             not null
        ,EndLocation    int             not null

        ,constraint PK unique clustered (StartLocation desc)
    );

    with 
          -- vvv Only needed in absence of Tally table vvv
    E1(N) as ( 
        select 1 from (values
            (1),(1),(1),(1),(1),
            (1),(1),(1),(1),(1)
        ) E1(N)
    ),                                              --10E+1 or 10 rows
    E2(N) as (select 1 from E1 a cross join E1 b),  --10E+2 or 100 rows
    E4(N) As (select 1 from E2 a cross join E2 b),  --10E+4 or 10,000 rows max
    tally(N) as (select row_number() over (order by (select null)) from E4),
          -- ^^^ Only needed in absence of Tally table ^^^

    Delimiter as (
        select len(@Delimiter)     as Length,
               len(@Delimiter)-1   as Offset
    ),
    cteTally(N) AS (
        select top (isnull(datalength(@InString),0)) 
            row_number() over (order by (select null)) 
        from tally
    ),
    cteStart(N1) AS 
        select 
            t.N 
        from cteTally t cross join Delimiter 
        where substring(@InString, t.N, Delimiter.Length) = @Delimiter
    ),
    cteValues as (
        select
             TagNumber = row_number() over(order by N1)
            ,Location   = N1
        from cteStart
    ),
    HtmlTagSpotter as (
        select
             TagNumber
            ,Location
        from cteValues
    ),
    tags as (
        select 
             Location       = f.Location
            ,IsOpen         = cast((TagNumber % 2) as bit)
            ,Occurrence     = TagNumber
        from HtmlTagSpotter f
    )
    insert #t(StartLocation,EndLocation)
    select 
         prev.Location
        ,data.Location
    from tags data
    join tags prev
       on prev.Occurrence = data.Occurrence - 1
      and prev.IsOpen     = 1;

    set @outString = @Instring;

    update this
    set @outString = stuff(stuff(@outString,this.EndLocation,  2,'</b>')
                                           ,this.StartLocation,2,'<b>')
    from #t this with (tablockx)
    option (maxdop 1);
end
go

Invoked like this:

像这样调用:

declare @InString   varchar(8000) 
       ,@OutString  varchar(8000);

set @inString = 'Questions about **general computing hardware and software** are off-topic **for Stack Overflow.';
exec #HtmlTagExpander @InString,@OutString out; select @OutString;

set @inString = 'Questions **about** general computing hardware and software **are off-topic** for Stack Overflow.';
exec #HtmlTagExpander @InString,@OutString out; select @OutString;
go

drop procedure #HtmlTagExpander;
go

It yields as output:

它产生输出:

Questions about <b>general computing hardware and software</b> are off-topic **for Stack Overflow.

Questions <b>about</b> general computing hardware and software <b>are off-topic</b> for Stack Overflow.

#3

One option is to use a Regular Expression as it makes replacing such patterns very simple. RegEx functions are not built into SQL Server so you need to use SQL CLR, either compiled by you or from an existing library.

一种选择是使用正则表达式,因为它使得替换这些模式非常简单。 RegEx函数未内置到SQL Server中,因此您需要使用由您或现有库编译的SQL CLR。

For this example I will use the SQL# (SQLsharp) library (which I am the author of) but the RegEx functions are available in the Free version.

对于此示例,我将使用SQL#(SQLsharp)库(我是其作者),但RegEx函数在免费版本中可用。

SELECT SQL#.RegEx_Replace
(
   N'Questions about **general computing hardware and software** are off-topic\
for Stack Overflow unless **they** directly involve tools used primarily for\
**programming. You may be able to get help on [Super User]\
(https://superuser.com/about)', -- @ExpressionToValidate
   N'\*\*([^\*]*)\*\*', -- @RegularExpression
   N'<b>$1</b>', -- @Replacement
   -1, -- @Count (-1 = all)
   1, - @StartAt
   'IgnoreCase' -- @RegEx options
);

The above pattern \*\*([^\*]*)\*\* just looks for anything surrounded by double-asterisks. In this case you don't need to worry about odd / even. It also means that you won't get a poorly-formed <b>-only tag if for some reason there is an extra ** in the string. I added two additional test cases to the original string: a complete set of ** around the word they and an unmatched set of ** just before the word programming. The output is:

上面的模式\ * \ *([^ \ *] *)\ * \ *只查找由双星号包围的任何内容。在这种情况下,您不必担心奇数/偶数。这也意味着如果由于某种原因字符串中有额外的**,您将不会得到格式不佳的 -only标记。我在原始字符串中添加了两个额外的测试用例:围绕单词的一整套**以及在单词编程之前的一组不匹配的**。输出是:

Questions about <b>general computing hardware and software</b> are off-topicfor Stack Overflow unless <b>they</b> directly involve tools used primarily for **programming. You may be able to get help on [Super User](https://superuser.com/about)

which renders as:

其呈现为:

Questions about general computing hardware and software are off-topicfor Stack Overflow unless they directly involve tools used primarily for **programming. You may be able to get help on Super User

关于通用计算硬件和软件的问题是Stack Overflow的主题,除非它们直接涉及主要用于**编程的工具。您可以获得超级用户的帮助

#4

This solution makes use of techniques described by Jeff Moden in this article on the Running Sum problem in SQL. This solution is lengthy, but by making use of the Quirky Update in SQL Server over a clustered index, holds the promise of being much more efficient over large data sets than cursor-based solutions.

此解决方案利用了Jeff Moden在本文中描述的SQL中的运行和问题。这个解决方案很冗长,但是通过在聚合索引上使用SQL Server中的Quirky Update,可以比基于游标的解决方案更有效地处理大型数据集。

Update - amended below to operate off a table of strings

更新 - 修改如下以操作字符串表

Assuming the existence of a tally table created like this (with at least 8000 rows):

假设存在这样创建的计数表(至少有8000行):

create table dbo.tally (
     N int not null
    ,unique clustered (N desc)
);
go

with 
E1(N) as ( 
    select 1 from (values
        (1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1)
    ) E1(N)
),                                              --10E+1 or 10 rows
E2(N) as (select 1 from E1 a cross join E1 b),  --10E+2 or 100 rows
E4(N) As (select 1 from E2 a cross join E2 b)   --10E+4 or 10,000 rows max
insert dbo.tally(N)
select row_number() over (order by (select null)) from E4;
go

and a HtmlTagSpotter function defined like this:

和一个像这样定义的HtmlTagSpotter函数:

create function dbo.HtmlTagSPotter(
     @pString       varchar(8000)
    ,@pDelimiter    char(2))
returns table with schemabinding as
return
   WITH 
        Delimiter as (
        select len(@pDelimiter)     as Length,
               len(@pDelimiter)-1   as Offset
    ),
    cteTally(N) AS (
        select top (isnull(datalength(@pstring),0)) 
            row_number() over (order by (select null)) 
        from dbo.tally
    ),
    cteStart(N1) AS (--==== Returns starting position of each "delimiter" )
        select 
            t.N 
        from cteTally t cross join Delimiter 
        where substring(@pString, t.N, Delimiter.Length) = @pDelimiter
    ),
    cteValues as (
        select
             ItemNumber = row_number() over(order by N1)
            ,Location   = N1
        from cteStart
    )
    select
         ItemNumber
        ,Location
    from cteValues
go

then running the following SQL will perform the required substitution. Note that the inner join at the end prevents any trailing "odd" tags from being converted:

然后运行以下SQL将执行所需的替换。请注意,最后的内部联接可防止转换任何尾随的“奇数”标记:

create table #t( 
     ItemNo         int             not null
    ,Item           varchar(8000)       null
    ,StartLocation  int             not null
    ,EndLocation    int             not null

    ,constraint PK unique clustered (ItemNo,StartLocation desc)
);

with data(i,s) as ( select i,s from (values
        (1,'Questions about **general computing hardware and software** are off-topic **for Stack Overflow.')
       ,(2,'Questions **about **general computing hardware and software** are off-topic **for Stack Overflow.')
          --....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8....,....9....,....0
    )data(i,s)
),
tags as (
    select 
         ItemNo         = data.i
        ,Item           = data.s
        ,Location       = f.Location
        ,IsOpen         = cast((TagNumber % 2) as bit)
        ,Occurrence     = TagNumber
    from data
    cross apply dbo.HtmlTagSPotter(data.s,'**') f
)
insert #t(ItemNo,Item,StartLocation,EndLocation)
select 
     data.ItemNo
    ,data.Item
    ,prev.Location
    ,data.Location
from tags data
join tags prev
   on prev.ItemNo       = data.ItemNo
  and prev.Occurrence = data.Occurrence - 1
  and prev.IsOpen     = 1

union all

select 
    i,s,8001,8002
from data
;

declare @ItemNo     int
       ,@ThisStting varchar(8000);

declare @s varchar(8000);
update this
    set @s = this.Item = case when this.StartLocation > 8000
                              then this.Item
                              else stuff(stuff(@s,this.EndLocation,  2,'</b>')
                                                 ,this.StartLocation,2,'<b>')
                         end
from #t this with (tablockx)
option (maxdop 1);

select
    Item
from (
    select 
         Item
        ,ROW_NUMBER() over (partition by ItemNo order by StartLocation) as rn
    from #t
) t
where rn = 1
go

yielding:

Item
------------------------------------------------------------------------------------------------------------
Questions about <b>general computing hardware and software</b> are off-topic **for Stack Overflow.
Questions <b>about </b>general computing hardware and software<b> are off-topic </b>for Stack Overflow.

#1