按首先匹配的数字排序,然后按第二个匹配的数字排序,依此类推

时间:2022-10-14 17:54:16

sort by first matching number then by second matching number in SQL

首先匹配数字,然后按SQL中的第二个匹配数字排序

Suppose, I have a table entries as following.

假设我有一个表条目如下。

Btc0504
Btc_0007_Shd_01
Btc_007_Shd_01
Bcd0007_Shd_7
ptc00044
Brg0007_Shd_6
Btc0075_Shd
Bcc43
MR_Tst_etc0565
wtc0004_Shd_4
vtc_Btc0605

so it should bring records as following.

所以它应该带来如下记录。

wtc0004_Shd_4
Bcc43
ptc00044
Btc_007_Shd_01
Btc_0007_Shd_01
Brg0007_Shd_6
Bcd0007_Shd_7
Btc0075_Shd
Btc0504
MR_Tst_etc0565
Btc_vtc0605

So basically it sorts by numbers only, words are only separator of numbers.

所以基本上它只按数字排序,单词只是数字的分隔符。

Here middle strings can be of any numbers.

中间字符串可以是任何数字。

They are not fixed and this pattern is also not fixed.

它们没有固定,这种模式也没有固定。

so there can be more strings and numbers with row. i.e. a1b2c3d4e5..., u7g2u9w2s8...

所以可以有更多的字符串和数字与行。即a1b2c3d4e5 ...,u7g2u9w2s8 ......

So require dynamic solution.

因此需要动态解决方案

Example table is given below.

示例表如下。

http://rextester.com/IDQ22263

http://rextester.com/IDQ22263

5 个解决方案

#1


2  

Assuming you would have 2 number blocks at most and each number would be 10 digits at most, I created a sample CLR UDF like this for you (DbProject - SQL CLR Database project):

假设您最多有2个数字块,每个数字最多为10位数,我为您创建了一个这样的CLR UDF示例(DbProject - SQL CLR数据库项目):

using System.Collections.Generic;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlString CustomStringParser(SqlString str)
    {
        int depth = 2; // 2 numbers at most
        int width = 10; // 10 digits at most

        List<string> numbers = new List<string>();
        var matches = Regex.Matches((string)str, @"\d+");
        foreach (Match match in matches)
        {
            numbers.Add(int.Parse(match.Value).ToString().PadLeft(width, '0'));
        }
        return string.Join("", numbers.ToArray()).PadRight(depth*width);
    }
}

I added this to the 'test' database as follows:

我将其添加到'test'数据库中,如下所示:

IF EXISTS ( SELECT  *
            FROM    sys.objects
            WHERE   object_id = OBJECT_ID(N'[dbo].[ufn_MyCustomParser]') AND
                    type IN ( N'FN', N'IF', N'TF', N'FS', N'FT' ) )
  DROP FUNCTION [dbo].[ufn_MyCustomParser]
GO
IF EXISTS ( SELECT  *
            FROM    sys.[assemblies] AS [a]
            WHERE   [a].[name] = 'DbProject' AND
                    [a].[is_user_defined] = 1 )
  DROP ASSEMBLY DbProject;
GO


CREATE ASSEMBLY DbProject
FROM 'C:\SQLCLR\DbProject\DbProject\bin\Debug\DbProject.dll'
WITH PERMISSION_SET = SAFE;
GO

CREATE FUNCTION ufn_MyCustomParser ( @csv NVARCHAR(4000))
RETURNS NVARCHAR(4000)
AS EXTERNAL NAME
  DbProject.[UserDefinedFunctions].CustomStringParser;
GO

Note: SQL server 2012 (2017 has strict security problem that you need to handle).

注意:SQL Server 2012(2017年有严格的安全问题需要处理)。

Finally tested with this T-SQL:

最后用这个T-SQL测试:

declare @MyTable table (col1 varchar(50));
insert into @MyTable values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_01'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_BwwwQAZtc0605'),
('Btc_Bwwwwe12541edddddtc0605'),
('QARTa1b2');
SELECT * FROM @MyTable
ORDER BY dbo.ufn_MyCustomParser(col1);

Output:

输出:

col1
QARTa1b2
Btc0004_Shd_4
Btc0007_Shd_01
Btc0007_Shd_6
Btc0007_Shd_7
Btc0504
MR_Tst_Btc0565
Btc_BwwwQAZtc0605
Btc_Bwwwwe12541edddddtc0605

#2


1  

Below query does the following: it uses patindex function, to extract index in a string of a pattern:

下面的查询执行以下操作:它使用patindex函数,在模式的字符串中提取索引:

  1. firstly, it extracts beginning of number, searching for a digit.

    首先,它提取数字的开头,搜索数字。

  2. Secondly, it extracts end of a number searching for digit followed by non-digit.

    其次,它提取数字的结尾搜索数字,然后是非数字。

Having done that, we have everything to extract a nuber from a string and sort by it after converting (casting) it to an integer.

完成后,我们拥有从字符串中提取nuber并在将其转换(转换)为整数后按其排序的所有内容。

Try this query:

试试这个查询:

declare @tbl table (col1 varchar(50));
insert into @tbl values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_Btc0605');

select col1 from (
    select col1,
           PATINDEX('%[0-9]%', col1) [startIndex],
           case PATINDEX('%[0-9][^0-9]%', col1) when 0 then LEN(col1) else     PATINDEX('%[0-9][^0-9]%', col1) end [endIndex]
    from @tbl
) [a]
order by CAST(SUBSTRING(col1, startIndex, endIndex - startIndex + 1) as int)

I came up with another solution, which is very compact and more general:

我提出了另一个解决方案,它非常紧凑,更通用:

;with cte as (
    select 1 [n], col1, STUFF(col1, PATINDEX('%[^0-9]%', col1), 1, '.') refined_col1 from @tbl
    union all
    select n+1, col1, STUFF(refined_col1, PATINDEX('%[^0-9.]%', refined_col1), 1, '.') from cte
    where n < 100 -- <--this number must be greater than the greatest amount of non-digits in a col1, this way, you are sure that you'll remove all unnecesary characters
)

select col1, refined_col1 from cte
where PATINDEX('%[^0-9.]%', refined_col1) = 0
order by CAST(replace(refined_col1, '.', '') as int)
option (maxrecursion 0)

#3


0  

I will begin my answer by saying that the best long term solution for you is to fix your data model. If you have need to use the various portions of the entry in queries, for sorting, etc., then consider storing them in separate bona-fide columns.

我将开始回答说,最好的长期解决方案是修复您的数据模型。如果您需要在查询中使用条目的各个部分,进行排序等,请考虑将它们存储在单独的真正的列中。

That being said, one workaround is to use basic string operations to extract the two components you want so use for sorting. Note carefully that we have to cast them to numbers, because otherwise they won't sort properly as text.

话虽这么说,一种解决方法是使用基本的字符串操作来提取您想要用于排序的两个组件。请注意,我们必须将它们转换为数字,否则它们将无法正确排序为文本。

SELECT *
FROM entries
ORDER BY
    CAST(SUBSTRING(entry, PATINDEX('%Btc[0-9]%', entry) + 3, 4) AS INT),
    CASE WHEN CHARINDEX('Shd_', entry) > 0
         THEN
         CAST(SUBSTRING(entry,
                        CHARINDEX('Shd_', entry) + 4,
                        LEN(entry) - CHARINDEX('Shd_', entry) -4) AS INT)
         ELSE 1 END;

按首先匹配的数字排序,然后按第二个匹配的数字排序,依此类推

Demo

#4


0  

You can use a tally table/numbers table to get each character and find only numbers and then combine the numbers in order to form a string(which can be casted into bigint). Then you can order based on this string.

您可以使用计数表/数字表来获取每个字符并仅查找数字,然后组合数字以形成字符串(可以将其转换为bigint)。然后你可以根据这个字符串订购。

See working demo

看工作演示

; with numbers as (
    select top 10000
        r= row_number() over( order by (select null))
    from sys.objects o1 
        cross join sys.objects o2
   )

, onlynumbers as
(
    select * from t 
    cross apply
    ( select part =substring(num,r,1),r
      from numbers where r<=len(num)
     )y
    where part  like '[0-9]' 
)

, finalorder as
(
    select num,cast(replace(stuff
    ((
        select ','+part
        from onlynumbers o2 
        where o2.num=o1.num
        order by o2.r
        for xml path('')
        ),1,1,''),',','') as bigint) b
  from onlynumbers o1
  group by num
   )
 select num from finalorder order by b asc

#5


0  

At the beginning I does not recommend the next approach for performance aspect , you should fix the root cause of your data.

一开始我不推荐下一个性能方面的方法,你应该修复数据的根本原因。

For handling dynamic inputs, I Think you should create UDF function for extracting the numbers only like next:-

为了处理动态输入,我认为您应该创建UDF函数来仅提取下一个数字: -

CREATE FUNCTION dbo.udf_ExtratcNumbersOnly
(@string VARCHAR(256))
RETURNS int
AS
BEGIN
    WHILE PATINDEX('%[^0-9]%',@string) <> 0
    SET @string = STUFF(@string,PATINDEX('%[^0-9]%',@string),1,'')
    RETURN cast (@string as int)
END
GO

Then use it as next:-

然后用它作为下一个: -

declare @MyTable table (col1 varchar(50));
insert into @MyTable values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_BwwwQAZtc0605'),
('Btc_Bwwwwe12541edddddtc0605'),
('QARTa1b2c3d4e5');

select * from @MyTable 
order by (dbo.udf_ExtratcNumbersOnly(col1))

Result:-

结果:-

Btc0004_Shd_4
Btc0007_Shd_6
Btc0007_Shd_7
Btc0504
MR_Tst_Btc0565
Btc_BwwwQAZtc0605
QARTa1b2c3d4e5
Btc_Bwwwwe12541edddddtc0605

Demo.

演示。

#1


2  

Assuming you would have 2 number blocks at most and each number would be 10 digits at most, I created a sample CLR UDF like this for you (DbProject - SQL CLR Database project):

假设您最多有2个数字块,每个数字最多为10位数,我为您创建了一个这样的CLR UDF示例(DbProject - SQL CLR数据库项目):

using System.Collections.Generic;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlString CustomStringParser(SqlString str)
    {
        int depth = 2; // 2 numbers at most
        int width = 10; // 10 digits at most

        List<string> numbers = new List<string>();
        var matches = Regex.Matches((string)str, @"\d+");
        foreach (Match match in matches)
        {
            numbers.Add(int.Parse(match.Value).ToString().PadLeft(width, '0'));
        }
        return string.Join("", numbers.ToArray()).PadRight(depth*width);
    }
}

I added this to the 'test' database as follows:

我将其添加到'test'数据库中,如下所示:

IF EXISTS ( SELECT  *
            FROM    sys.objects
            WHERE   object_id = OBJECT_ID(N'[dbo].[ufn_MyCustomParser]') AND
                    type IN ( N'FN', N'IF', N'TF', N'FS', N'FT' ) )
  DROP FUNCTION [dbo].[ufn_MyCustomParser]
GO
IF EXISTS ( SELECT  *
            FROM    sys.[assemblies] AS [a]
            WHERE   [a].[name] = 'DbProject' AND
                    [a].[is_user_defined] = 1 )
  DROP ASSEMBLY DbProject;
GO


CREATE ASSEMBLY DbProject
FROM 'C:\SQLCLR\DbProject\DbProject\bin\Debug\DbProject.dll'
WITH PERMISSION_SET = SAFE;
GO

CREATE FUNCTION ufn_MyCustomParser ( @csv NVARCHAR(4000))
RETURNS NVARCHAR(4000)
AS EXTERNAL NAME
  DbProject.[UserDefinedFunctions].CustomStringParser;
GO

Note: SQL server 2012 (2017 has strict security problem that you need to handle).

注意:SQL Server 2012(2017年有严格的安全问题需要处理)。

Finally tested with this T-SQL:

最后用这个T-SQL测试:

declare @MyTable table (col1 varchar(50));
insert into @MyTable values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_01'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_BwwwQAZtc0605'),
('Btc_Bwwwwe12541edddddtc0605'),
('QARTa1b2');
SELECT * FROM @MyTable
ORDER BY dbo.ufn_MyCustomParser(col1);

Output:

输出:

col1
QARTa1b2
Btc0004_Shd_4
Btc0007_Shd_01
Btc0007_Shd_6
Btc0007_Shd_7
Btc0504
MR_Tst_Btc0565
Btc_BwwwQAZtc0605
Btc_Bwwwwe12541edddddtc0605

#2


1  

Below query does the following: it uses patindex function, to extract index in a string of a pattern:

下面的查询执行以下操作:它使用patindex函数,在模式的字符串中提取索引:

  1. firstly, it extracts beginning of number, searching for a digit.

    首先,它提取数字的开头,搜索数字。

  2. Secondly, it extracts end of a number searching for digit followed by non-digit.

    其次,它提取数字的结尾搜索数字,然后是非数字。

Having done that, we have everything to extract a nuber from a string and sort by it after converting (casting) it to an integer.

完成后,我们拥有从字符串中提取nuber并在将其转换(转换)为整数后按其排序的所有内容。

Try this query:

试试这个查询:

declare @tbl table (col1 varchar(50));
insert into @tbl values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_Btc0605');

select col1 from (
    select col1,
           PATINDEX('%[0-9]%', col1) [startIndex],
           case PATINDEX('%[0-9][^0-9]%', col1) when 0 then LEN(col1) else     PATINDEX('%[0-9][^0-9]%', col1) end [endIndex]
    from @tbl
) [a]
order by CAST(SUBSTRING(col1, startIndex, endIndex - startIndex + 1) as int)

I came up with another solution, which is very compact and more general:

我提出了另一个解决方案,它非常紧凑,更通用:

;with cte as (
    select 1 [n], col1, STUFF(col1, PATINDEX('%[^0-9]%', col1), 1, '.') refined_col1 from @tbl
    union all
    select n+1, col1, STUFF(refined_col1, PATINDEX('%[^0-9.]%', refined_col1), 1, '.') from cte
    where n < 100 -- <--this number must be greater than the greatest amount of non-digits in a col1, this way, you are sure that you'll remove all unnecesary characters
)

select col1, refined_col1 from cte
where PATINDEX('%[^0-9.]%', refined_col1) = 0
order by CAST(replace(refined_col1, '.', '') as int)
option (maxrecursion 0)

#3


0  

I will begin my answer by saying that the best long term solution for you is to fix your data model. If you have need to use the various portions of the entry in queries, for sorting, etc., then consider storing them in separate bona-fide columns.

我将开始回答说,最好的长期解决方案是修复您的数据模型。如果您需要在查询中使用条目的各个部分,进行排序等,请考虑将它们存储在单独的真正的列中。

That being said, one workaround is to use basic string operations to extract the two components you want so use for sorting. Note carefully that we have to cast them to numbers, because otherwise they won't sort properly as text.

话虽这么说,一种解决方法是使用基本的字符串操作来提取您想要用于排序的两个组件。请注意,我们必须将它们转换为数字,否则它们将无法正确排序为文本。

SELECT *
FROM entries
ORDER BY
    CAST(SUBSTRING(entry, PATINDEX('%Btc[0-9]%', entry) + 3, 4) AS INT),
    CASE WHEN CHARINDEX('Shd_', entry) > 0
         THEN
         CAST(SUBSTRING(entry,
                        CHARINDEX('Shd_', entry) + 4,
                        LEN(entry) - CHARINDEX('Shd_', entry) -4) AS INT)
         ELSE 1 END;

按首先匹配的数字排序,然后按第二个匹配的数字排序,依此类推

Demo

#4


0  

You can use a tally table/numbers table to get each character and find only numbers and then combine the numbers in order to form a string(which can be casted into bigint). Then you can order based on this string.

您可以使用计数表/数字表来获取每个字符并仅查找数字,然后组合数字以形成字符串(可以将其转换为bigint)。然后你可以根据这个字符串订购。

See working demo

看工作演示

; with numbers as (
    select top 10000
        r= row_number() over( order by (select null))
    from sys.objects o1 
        cross join sys.objects o2
   )

, onlynumbers as
(
    select * from t 
    cross apply
    ( select part =substring(num,r,1),r
      from numbers where r<=len(num)
     )y
    where part  like '[0-9]' 
)

, finalorder as
(
    select num,cast(replace(stuff
    ((
        select ','+part
        from onlynumbers o2 
        where o2.num=o1.num
        order by o2.r
        for xml path('')
        ),1,1,''),',','') as bigint) b
  from onlynumbers o1
  group by num
   )
 select num from finalorder order by b asc

#5


0  

At the beginning I does not recommend the next approach for performance aspect , you should fix the root cause of your data.

一开始我不推荐下一个性能方面的方法,你应该修复数据的根本原因。

For handling dynamic inputs, I Think you should create UDF function for extracting the numbers only like next:-

为了处理动态输入,我认为您应该创建UDF函数来仅提取下一个数字: -

CREATE FUNCTION dbo.udf_ExtratcNumbersOnly
(@string VARCHAR(256))
RETURNS int
AS
BEGIN
    WHILE PATINDEX('%[^0-9]%',@string) <> 0
    SET @string = STUFF(@string,PATINDEX('%[^0-9]%',@string),1,'')
    RETURN cast (@string as int)
END
GO

Then use it as next:-

然后用它作为下一个: -

declare @MyTable table (col1 varchar(50));
insert into @MyTable values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_BwwwQAZtc0605'),
('Btc_Bwwwwe12541edddddtc0605'),
('QARTa1b2c3d4e5');

select * from @MyTable 
order by (dbo.udf_ExtratcNumbersOnly(col1))

Result:-

结果:-

Btc0004_Shd_4
Btc0007_Shd_6
Btc0007_Shd_7
Btc0504
MR_Tst_Btc0565
Btc_BwwwQAZtc0605
QARTa1b2c3d4e5
Btc_Bwwwwe12541edddddtc0605

Demo.

演示。