如何将该表中的信息更改为易于使用的表单?

I have a legacy product that I have to maintain. One of the table is somewhat similar to the following example:

我有一个遗留产品需要维护。其中一张表格与下面的例子有些相似:

DECLARE @t TABLE
(
 id INT,
 DATA NVARCHAR(30)
);

INSERT  INTO @t
        SELECT  1,
                'name: Jim Ey'
        UNION ALL
        SELECT  2,
                'age: 43'
        UNION ALL
        SELECT  3,
                '----------------'
        UNION ALL
        SELECT  4,
                'name: Johnson Dom'
        UNION ALL
        SELECT  5,
                'age: 34'
        UNION ALL
        SELECT  6,
                '----------------'
        UNION ALL
        SELECT  7,
                'name: Jason Thwe'
        UNION ALL
        SELECT  8,
                'age: 22'

SELECT  *
FROM    @t;
/*
You will get the following result
id          DATA
----------- ------------------------------
1           name: Jim Ey
2           age: 43
3           ----------------
4           name: Johnson Dom
5           age: 34
6           ----------------
7           name: Jason Thwe
8           age: 22
*/

Now I want to get the information in the following form:

现在我想以以下形式获取信息:

name           age
-------------- --------
Jim Ey         43
Johnson Dom    34
Jason Thwe     22

What's the easiest way to do this? Thanks.

最简单的方法是什么?谢谢。

4 个解决方案

#1

Out of (slightly morbid) curiosity I tried to come up with a means of transforming the exact input data you have provided.

出于(有点病态的)好奇心，我试图想出一种方法来转换您提供的确切输入数据。

Far better, of course, would be to properly structure the original data. With a legacy system, this may not be possible, but an ETL process could be created to bring this information into an intermediate location so that an ugly query like this would not need to be run in real time.

当然，更好的办法是正确地构造原始数据。对于遗留系统，这可能是不可能的，但是可以创建一个ETL进程来将这些信息带到一个中间位置，这样就不需要实时运行这种丑陋的查询了。

Example #1

This example assumes that all IDs are consistent and sequential (otherwise, an additional ROW_NUMBER() column or a new identity column would need to be used to guarantee correct remainder operations on ID).

这个例子假设所有的ID都是一致的和连续的(否则，需要使用一个额外的ROW_NUMBER()列或一个新的标识列来保证ID上正确的剩余操作)。

SELECT
    Name = REPLACE( Name, 'name: ', '' ),
    Age = REPLACE( Age, 'age: ', '' )
FROM
(
    SELECT
        Name = T2.Data,
        Age = T1.Data,
        RowNumber = ROW_NUMBER() OVER( ORDER BY T1.Id ASC )

    FROM @t T1 
        INNER JOIN @t T2 ON T1.id = T2.id +1 -- offset by one to combine two rows
    WHERE T1.id % 3 != 0 -- skip delimiter records
) Q1
 -- skip every other record (minus delimiters, which have already been stripped)
WHERE RowNumber % 2 != 0

Example #2: No Dependency on Sequential IDs

This is a more practical example because the actual ID values do not matter, only the row sequence.

这是一个更实际的示例，因为实际的ID值并不重要，只有行序列。

DECLARE @NumberedData TABLE( RowNumber INT, Data VARCHAR( 100 ) );

INSERT @NumberedData( RowNumber, Data )
    SELECT 
        RowNumber = ROW_NUMBER() OVER( ORDER BY id ASC ),
        Data
    FROM @t;

SELECT 
    Name = REPLACE( N2.Data, 'name: ', '' ),
    Age = REPLACE( N1.Data, 'age: ', '' ) 
FROM @NumberedData N1 
    INNER JOIN @NumberedData N2 ON N1.RowNumber = N2.RowNumber + 1
WHERE ( N1.RowNumber % 3 ) = 2;

DELETE @NumberedData;

Example #3: Cursor

Again, it would be best to avoid running a query like this in real time and use a scheduled, transactional ETL process. In my experience, semi-structured data like this is prone to anomalies.

同样，最好避免实时运行这样的查询，并使用预定的事务ETL进程。根据我的经验，像这样的半结构化数据容易出现异常。

While examples #1 and #2 (and the solutions provided by others) demonstrate clever ways of working with the data, a more practical way to transform this data would be a cursor. Why? it may actually perform better (no nested queries, recursion, pivoting, or row numbering) and even if it is slower it provides much better opportunities for error handling.

虽然示例#1和#2(以及其他提供的解决方案)演示了处理数据的巧妙方法，但是转换这些数据的更实际的方法是使用游标。为什么?它实际上可能执行得更好(没有嵌套查询、递归、旋转或行编号)，即使它更慢，它也为错误处理提供了更好的机会。

-- this could be a table variable, temp table, or staging table
DECLARE @Results TABLE ( Name VARCHAR( 100 ), Age INT );

DECLARE @Index INT = 0, @Data VARCHAR( 100 ), @Name VARCHAR( 100 ), @Age INT;

DECLARE Person_Cursor CURSOR FOR SELECT Data FROM @t;
OPEN Person_Cursor;
FETCH NEXT FROM Person_Cursor INTO @Data;

WHILE( 1 = 1 )BEGIN -- busy loop so we can handle the iteration following completion
    IF( @Index = 2 ) BEGIN
        INSERT @Results( Name, Age ) VALUES( @Name, @Age );
        SET @Index = 0;
    END
    ELSE BEGIN
            -- optional: examine @Data for integrity

        IF( @Index = 0 ) SET @Name = REPLACE( @Data, 'name: ', '' );
        IF( @Index = 1 ) SET @Age = CAST( REPLACE( @Data, 'age: ', '' ) AS INT );
        SET @Index = @Index + 1;
    END

    -- optional: examine @Index to see that there are no superfluous trailing 
    -- rows or rows omitted at the end.

    IF( @@FETCH_STATUS != 0 ) BREAK;
    FETCH NEXT FROM Person_Cursor INTO @Data;
END

CLOSE Person_Cursor;
DEALLOCATE Person_Cursor;

Performance

I created sample source data of 100K rows and the three aforementioned examples seem roughly equivalent for transforming data.

我创建了100K行的示例源数据，前面提到的三个示例在转换数据时大致相同。

I created a million rows of source data and a query similar to the following gives excellent performance for selecting a subset of rows (such as would be used in a grid on a web page or a report).

我创建了100万行源数据，类似于下面的查询在选择行的子集(比如在web页面或报表的网格中使用)方面提供了出色的性能。

-- INT IDENTITY( 1, 1 ) numbers the rows for us
DECLARE @NumberedData TABLE( RowNumber INT IDENTITY( 1, 1 ), Data VARCHAR( 100 ) );

-- subset selection; ordering/filtering can be done here but it will need to preserve
-- the original 3 rows-per-result structure and it will impact performance
INSERT @NumberedData( Data )
    SELECT TOP 1000 Data FROM @t;

SELECT
    N1.RowNumber,
    Name = REPLACE( N2.Data, 'name: ', '' ),
    Age = REPLACE( N1.Data, 'age: ', '' ) 
FROM @NumberedData N1 
    INNER JOIN @NumberedData N2 ON N1.RowNumber = N2.RowNumber + 1
WHERE ( N1.RowNumber % 3 ) = 2;

DELETE @NumberedData;

I'm seeing execution times of 4-10ms (i7-3960x) against a set of a million records.

我看到执行时间是4-10ms (i7-3960x)，而记录是100万。

#2

Given that table you can do this:

考虑到这个表格，你可以这样做:

;WITH DATA
AS
(
    SELECT
        SUBSTRING(t.DATA,CHARINDEX(':',t.DATA)+2,LEN(t.DATA)) AS value,
        SUBSTRING(t.DATA,0,CHARINDEX(':',t.DATA)) AS ValueType,
        ID,
        ROW_NUMBER() OVER(ORDER BY ID) AS RowNbr
    FROM
        @t AS t
    WHERE
        NOT t.DATA='----------------'
)
, RecursiveCTE
AS
(
    SELECT
        Data.RowNbr,
        Data.value,
        Data.ValueType,
        NEWID() AS ID
    FROM
        Data
    WHERE
        Data.RowNbr=1
    UNION ALL
    SELECT
        Data.RowNbr,
        Data.value,
        Data.ValueType,
        CASE 
            WHEN Data.ValueType='age'
            THEN RecursiveCTE.ID
            ELSE NEWID()
        END AS ID
    FROM
        Data
        JOIN RecursiveCTE
            ON RecursiveCTE.RowNbr+1=Data.RowNbr
)
SELECT
    pvt.name,
    pvt.age
FROM
    (
        SELECT
            ID,
            value,
            ValueType
        FROM
            RecursiveCTE
    ) AS SourceTable
    PIVOT
    (
        MAX(Value)
        FOR ValueType IN ([name],[age])
    ) AS pvt

Output

输出

Name          Age
------------------
Jim Ey        43
Jason Thwe    22
Johnson Dom   34

#3

One solution without self-joins, recursion and with a single pass over the rows from @t:

一个没有自连接、递归和从@t传递行的解决方案:

SELECT  *
FROM
(
        SELECT  
                CASE 
                    WHEN a.DATA LIKE 'name:%' THEN 'Name'
                    ELSE 'Age'
                END AS Attribute,
                CASE 
                    WHEN a.DATA LIKE 'name:%' THEN SUBSTRING(a.DATA, 7, 4000) --or LTRIM(SUBSTRING(...,6,...))
                    ELSE SUBSTRING(a.DATA, 6, 4000) --or LTRIM(SUBSTRING(...,5,...))
                END AS Value,
                (ROW_NUMBER() OVER(ORDER BY id) + 1) / 2 AS PseudoDenseRank
        FROM    @t a
        WHERE   a.DATA LIKE 'name:%' OR a.DATA LIKE 'age:%'
) b
PIVOT( MAX(b.Value) FOR b.Attribute IN ([Name], [Age]) ) pvt

Results:

结果:

PseudoDenseRank Name        Age
--------------- ----------- ---
1               Jim Ey      43
2               Johnson Dom 34
3               Jason Thwe  22

Note 1: derived table b will group name:% and age:% rows using (ROW_NUMBER() OVER(ORDER BY id) + 1) / 2. Results for derived table b:

注1:派生表b将使用(ROW_NUMBER() /(按id排序)+ 1)/ 2对名称:%和年龄:%行进行分组。派生表b的结果:

Attribute Value       ROW_NUMBER() OVER(ORDER BY id) PseudoDenseRank
--------- ----------- ------------------------------ ---------------
Name      Jim Ey      1                              1
Age       43          2                              1
Name      Johnson Dom 3                              2
Age       34          4                              2
Name      Jason Thwe  5                              3
Age       22          6                              3

Note 2: if the values from id column doesn't have gaps (ex. (id 1, name:Jim Ey), (id 3 age: 43) ) then you can use (a.id + 1) / 2 AS PseudoDenseRank instead of (ROW_NUMBER() OVER(ORDER BY id) + 1) / 2 AS PseudoDenseRank.

注意2:如果id列中的值没有间隙(例如(id 1, name:Jim Ey)， (id 3 age: 43))，那么可以使用(a)。id + 1) / 2为伪锯齿，而不是(ROW_NUMBER() /(按id排序)+ 1)/ 2为伪锯齿。

Note 3: if you use (a.id + 1) / 2 AS PseudoDenseRank solution (to group name and age rows) then the first id value should be an odd number. If the first id value is a even number then you should use this expression a.id / 2 AS PseudoDenseRank.

注3:如果你使用(a)id + 1) / 2作为伪加密解决方案(针对组名和年龄行)，那么第一个id值应该是一个奇数。如果第一个id值是一个偶数，那么您应该使用这个表达式a。id / 2为伪锯齿。

#4

Here's another option if you upgrade to SQL Server 2012, which implements the OVER clause for aggregate functions. This approach will allow you to choose only those tags that you know you want and find them regardless of how many rows there are between names.

如果您升级到SQL Server 2012，这里还有一个选项，它实现聚合函数的OVER子句。这种方法将允许您只选择您知道需要的标记并找到它们，而不管名称之间有多少行。

This will also work if the names and ages are not always in the same order within a group of rows representing a single person.

如果姓名和年龄在一组代表单个人的行中并不总是以相同的顺序排列，那么也可以这样做。

with Ready2Pivot(tag,val,part) as (
  select
    CASE WHEN DATA like '_%:%' THEN SUBSTRING(DATA,1,CHARINDEX(':',DATA)-1) END as tag,
    CASE WHEN DATA like '_%:%' THEN SUBSTRING(DATA,CHARINDEX(':',DATA)+1,8000) END as val,
    max(id * CASE WHEN DATA LIKE 'name:%' THEN 1 ELSE 0 END)
    over (
      order by id
    )
  from @t
  where DATA like '_%:%'
)
  select [name], [age]
  from Ready2Pivot
  pivot (
    max(val)
    for tag in ([name], [age])
  ) as p

If your legacy data has an entry with extra items (say "altName: Jimmy"), this query will ignore it. If your legacy data has no row (and no id number) for someone's age, it will give you NULL in that spot. It will associate all information with the closest preceding row with "name: ..." as the DATA, so it is important that every group of rows has a "name: ..." row.

如果您的遗留数据有一个条目(比如“altName: Jimmy”)，那么这个查询将忽略它。如果您的遗留数据对某人的年龄没有行(而且没有id号)，那么它将在该位置给您NULL。它将把所有的信息与前面最近的行“name:…”关联起来作为数据，所以每组行都有一个“name:…”行是很重要的。

#1

Out of (slightly morbid) curiosity I tried to come up with a means of transforming the exact input data you have provided.

出于(有点病态的)好奇心，我试图想出一种方法来转换您提供的确切输入数据。

Example #1

这个例子假设所有的ID都是一致的和连续的(否则，需要使用一个额外的ROW_NUMBER()列或一个新的标识列来保证ID上正确的剩余操作)。

SELECT
    Name = REPLACE( Name, 'name: ', '' ),
    Age = REPLACE( Age, 'age: ', '' )
FROM
(
    SELECT
        Name = T2.Data,
        Age = T1.Data,
        RowNumber = ROW_NUMBER() OVER( ORDER BY T1.Id ASC )

    FROM @t T1 
        INNER JOIN @t T2 ON T1.id = T2.id +1 -- offset by one to combine two rows
    WHERE T1.id % 3 != 0 -- skip delimiter records
) Q1
 -- skip every other record (minus delimiters, which have already been stripped)
WHERE RowNumber % 2 != 0

Example #2: No Dependency on Sequential IDs

This is a more practical example because the actual ID values do not matter, only the row sequence.

这是一个更实际的示例，因为实际的ID值并不重要，只有行序列。

DECLARE @NumberedData TABLE( RowNumber INT, Data VARCHAR( 100 ) );

INSERT @NumberedData( RowNumber, Data )
    SELECT 
        RowNumber = ROW_NUMBER() OVER( ORDER BY id ASC ),
        Data
    FROM @t;

SELECT 
    Name = REPLACE( N2.Data, 'name: ', '' ),
    Age = REPLACE( N1.Data, 'age: ', '' ) 
FROM @NumberedData N1 
    INNER JOIN @NumberedData N2 ON N1.RowNumber = N2.RowNumber + 1
WHERE ( N1.RowNumber % 3 ) = 2;

DELETE @NumberedData;

Example #3: Cursor

Again, it would be best to avoid running a query like this in real time and use a scheduled, transactional ETL process. In my experience, semi-structured data like this is prone to anomalies.

同样，最好避免实时运行这样的查询，并使用预定的事务ETL进程。根据我的经验，像这样的半结构化数据容易出现异常。

-- this could be a table variable, temp table, or staging table
DECLARE @Results TABLE ( Name VARCHAR( 100 ), Age INT );

DECLARE @Index INT = 0, @Data VARCHAR( 100 ), @Name VARCHAR( 100 ), @Age INT;

DECLARE Person_Cursor CURSOR FOR SELECT Data FROM @t;
OPEN Person_Cursor;
FETCH NEXT FROM Person_Cursor INTO @Data;

WHILE( 1 = 1 )BEGIN -- busy loop so we can handle the iteration following completion
    IF( @Index = 2 ) BEGIN
        INSERT @Results( Name, Age ) VALUES( @Name, @Age );
        SET @Index = 0;
    END
    ELSE BEGIN
            -- optional: examine @Data for integrity

        IF( @Index = 0 ) SET @Name = REPLACE( @Data, 'name: ', '' );
        IF( @Index = 1 ) SET @Age = CAST( REPLACE( @Data, 'age: ', '' ) AS INT );
        SET @Index = @Index + 1;
    END

    -- optional: examine @Index to see that there are no superfluous trailing 
    -- rows or rows omitted at the end.

    IF( @@FETCH_STATUS != 0 ) BREAK;
    FETCH NEXT FROM Person_Cursor INTO @Data;
END

CLOSE Person_Cursor;
DEALLOCATE Person_Cursor;

Performance

I created sample source data of 100K rows and the three aforementioned examples seem roughly equivalent for transforming data.

我创建了100K行的示例源数据，前面提到的三个示例在转换数据时大致相同。

I created a million rows of source data and a query similar to the following gives excellent performance for selecting a subset of rows (such as would be used in a grid on a web page or a report).

我创建了100万行源数据，类似于下面的查询在选择行的子集(比如在web页面或报表的网格中使用)方面提供了出色的性能。

-- INT IDENTITY( 1, 1 ) numbers the rows for us
DECLARE @NumberedData TABLE( RowNumber INT IDENTITY( 1, 1 ), Data VARCHAR( 100 ) );

-- subset selection; ordering/filtering can be done here but it will need to preserve
-- the original 3 rows-per-result structure and it will impact performance
INSERT @NumberedData( Data )
    SELECT TOP 1000 Data FROM @t;

SELECT
    N1.RowNumber,
    Name = REPLACE( N2.Data, 'name: ', '' ),
    Age = REPLACE( N1.Data, 'age: ', '' ) 
FROM @NumberedData N1 
    INNER JOIN @NumberedData N2 ON N1.RowNumber = N2.RowNumber + 1
WHERE ( N1.RowNumber % 3 ) = 2;

DELETE @NumberedData;

I'm seeing execution times of 4-10ms (i7-3960x) against a set of a million records.

我看到执行时间是4-10ms (i7-3960x)，而记录是100万。

#2

Given that table you can do this:

考虑到这个表格，你可以这样做:

;WITH DATA
AS
(
    SELECT
        SUBSTRING(t.DATA,CHARINDEX(':',t.DATA)+2,LEN(t.DATA)) AS value,
        SUBSTRING(t.DATA,0,CHARINDEX(':',t.DATA)) AS ValueType,
        ID,
        ROW_NUMBER() OVER(ORDER BY ID) AS RowNbr
    FROM
        @t AS t
    WHERE
        NOT t.DATA='----------------'
)
, RecursiveCTE
AS
(
    SELECT
        Data.RowNbr,
        Data.value,
        Data.ValueType,
        NEWID() AS ID
    FROM
        Data
    WHERE
        Data.RowNbr=1
    UNION ALL
    SELECT
        Data.RowNbr,
        Data.value,
        Data.ValueType,
        CASE 
            WHEN Data.ValueType='age'
            THEN RecursiveCTE.ID
            ELSE NEWID()
        END AS ID
    FROM
        Data
        JOIN RecursiveCTE
            ON RecursiveCTE.RowNbr+1=Data.RowNbr
)
SELECT
    pvt.name,
    pvt.age
FROM
    (
        SELECT
            ID,
            value,
            ValueType
        FROM
            RecursiveCTE
    ) AS SourceTable
    PIVOT
    (
        MAX(Value)
        FOR ValueType IN ([name],[age])
    ) AS pvt

Output

输出

Name          Age
------------------
Jim Ey        43
Jason Thwe    22
Johnson Dom   34

#3

One solution without self-joins, recursion and with a single pass over the rows from @t:

一个没有自连接、递归和从@t传递行的解决方案:

SELECT  *
FROM
(
        SELECT  
                CASE 
                    WHEN a.DATA LIKE 'name:%' THEN 'Name'
                    ELSE 'Age'
                END AS Attribute,
                CASE 
                    WHEN a.DATA LIKE 'name:%' THEN SUBSTRING(a.DATA, 7, 4000) --or LTRIM(SUBSTRING(...,6,...))
                    ELSE SUBSTRING(a.DATA, 6, 4000) --or LTRIM(SUBSTRING(...,5,...))
                END AS Value,
                (ROW_NUMBER() OVER(ORDER BY id) + 1) / 2 AS PseudoDenseRank
        FROM    @t a
        WHERE   a.DATA LIKE 'name:%' OR a.DATA LIKE 'age:%'
) b
PIVOT( MAX(b.Value) FOR b.Attribute IN ([Name], [Age]) ) pvt

Results:

结果:

PseudoDenseRank Name        Age
--------------- ----------- ---
1               Jim Ey      43
2               Johnson Dom 34
3               Jason Thwe  22

Note 1: derived table b will group name:% and age:% rows using (ROW_NUMBER() OVER(ORDER BY id) + 1) / 2. Results for derived table b:

注1:派生表b将使用(ROW_NUMBER() /(按id排序)+ 1)/ 2对名称:%和年龄:%行进行分组。派生表b的结果:

Attribute Value       ROW_NUMBER() OVER(ORDER BY id) PseudoDenseRank
--------- ----------- ------------------------------ ---------------
Name      Jim Ey      1                              1
Age       43          2                              1
Name      Johnson Dom 3                              2
Age       34          4                              2
Name      Jason Thwe  5                              3
Age       22          6                              3

注意2:如果id列中的值没有间隙(例如(id 1, name:Jim Ey)， (id 3 age: 43))，那么可以使用(a)。id + 1) / 2为伪锯齿，而不是(ROW_NUMBER() /(按id排序)+ 1)/ 2为伪锯齿。

#4

This will also work if the names and ages are not always in the same order within a group of rows representing a single person.

如果姓名和年龄在一组代表单个人的行中并不总是以相同的顺序排列，那么也可以这样做。

with Ready2Pivot(tag,val,part) as (
  select
    CASE WHEN DATA like '_%:%' THEN SUBSTRING(DATA,1,CHARINDEX(':',DATA)-1) END as tag,
    CASE WHEN DATA like '_%:%' THEN SUBSTRING(DATA,CHARINDEX(':',DATA)+1,8000) END as val,
    max(id * CASE WHEN DATA LIKE 'name:%' THEN 1 ELSE 0 END)
    over (
      order by id
    )
  from @t
  where DATA like '_%:%'
)
  select [name], [age]
  from Ready2Pivot
  pivot (
    max(val)
    for tag in ([name], [age])
  ) as p

秒客网

如何将该表中的信息更改为易于使用的表单?

4 个解决方案

#1

Example #1

Example #2: No Dependency on Sequential IDs

Example #3: Cursor

Performance

#2

#3

#4

#1

Example #1

Example #2: No Dependency on Sequential IDs

Example #3: Cursor

Performance

#2

#3

#4

相关文章