从逗号分隔值创建sql视图

时间:2021-02-04 00:20:51

T-sql question: I need help to build a join from 2 tables, where on one of the tables I have aggregated data (comma separated values).

T-sql问题:我需要帮助从2个表构建一个连接,其中一个表我有聚合数据(逗号分隔值)。

I have a table - Users where I have 3 columns: UserId, DefaultLanguage and OtherLanguages.

我有一个表 - 用户我有3列:UserId,DefaultLanguage和OtherLanguages。

The table looks like this:

该表如下所示:

UserId  | DefaultLanguage  |  OtherLanguages
---------------------------------------------
   1    |      en          |       NULL
   2    |      en          |       it, fr
   3    |      fr          |       en, it
   4    |      en          |       sp

and so on.

等等。

I have another table where I have the association between language code (en, fr, ro, it, sp) and language name:

我有另一个表,我有语言代码(en,fr,ro,it,sp)和语言名称之间的关联:

 LangCode  | LanguageName
-------------------------
    en     | English
    fr     | French
    it     | Italian
    sp     | Spanish

and so on.

等等。

I want to create a view like this:

我想创建一个这样的视图:

UserId  | DefaultLanguage  |  OtherLanguages
---------------------------------------------
   1    |    English       |    NULL
   2    |    English       |    Italian, French
   3    |    French        |    English, Italian
   4    |    English       |    Spanish

and so on.

等等。

In short, I need a view where the language code is replaced by language name.

简而言之,我需要一个视图,其中语言代码被语言名称替换。

Any help, please?

有什么帮助吗?

3 个解决方案

#1


2  

Several solutions of course you can recreate all table change the data structure. 1. If all the language are 2 digits:

几种解决方案当然可以重新创建所有表来更改数据结构。 1.如果所有语言都是2位数:

select t1.UserId, t2.LanguageName, 
ISNULL( t3.LanguageName, '') + ISNULL(', '+t4.LanguageName, '') + ISNULL( ', '+t5.LanguageName, '') OtherLanguages
from Table1 t1 
inner join Table2 t2 on t1.DefaultLanguage = t2.LangCode
left join Table2 t3 on Left(t1.OtherLanguages,2) = t3.LangCode
left join Table2 t4 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 3 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 4, 2) ELSE null END = t4.LangCode
left join Table2 t5 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 6 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 7, 2) ELSE null END = t5.LangCode
  1. Use user-define function:
  2. 使用用户定义函数:

CREATE FUNCTION [dbo].[func_GetLanguageName] (@pLanguageList varchar(max))

CREATE FUNCTION [dbo]。[func_GetLanguageName](@ pLanguageList varchar(max))

RETURNS varchar(max) AS

RETURNS varchar(max)AS

BEGIN

开始

Declare @aLanguageList varchar(max) = @pLanguageList
Declare @aLangCode varchar(max) = null
Declare @aReturnName varchar(max) = null
WHILE LEN(@aLanguageList) > 0
BEGIN
    IF PATINDEX('%,%',@aLanguageList) > 0
    BEGIN
        SET @aLangCode = RTRIM(LTRIM(SUBSTRING(@aLanguageList, 0, PATINDEX('%,%',@aLanguageList))))
        SET @aLanguageList = LTRIM(SUBSTRING(@aLanguageList, LEN(@aLangCode + ',') + 1,LEN(@aLanguageList)))
    END
    ELSE
    BEGIN
        SET @aLangCode = @aLanguageList
        SET @aLanguageList = NULL
    END
    Select @aReturnName = ISNULL( @aReturnName + ', ' , '') + LanguageName from Table2 where LangCode=@aLangCode
END
RETURN(@aReturnName)

END

结束

and use select

并使用选择

select UserId, dbo.func_GetLanguageName(DefaultLanguage)DefaultLanguage, dbo.func_GetLanguageName(OtherLanguages) OtherLanguages from table1

#2


1  

Best practice would dictate not to have this type of comma delimited data in a column...

最佳实践将要求不在列中使用此类逗号分隔数据...

Since you stated in comments that the schema cannot be changed, the next best thing is a function. This can be used in a select query in-line.

由于您在评论中声明无法更改架构,因此下一个最好的功能是功能。这可以在内联的选择查询中使用。

SQL is notoriously slow with string manipulation. Here is an interesting article on the topic. There are many SQL "string split" functions out there. They all generally split a comma delimited string and return a table.

SQL字符串操作非常慢。这是一篇关于这个主题的有趣文章。那里有许多SQL“字符串拆分”功能。它们通常都会分割逗号分隔的字符串并返回一个表。

For this specific use-case, you actually need a scalar-valued function (a function which returns one value) rather than a table-valued function (one which returns a table of values).

对于这个特定的用例,您实际上需要一个标量值函数(一个返回一个值的函数)而不是一个表值函数(一个返回值表的函数)。

Below is a modified such function, which returns a scalar value in place of the original comma delimited string of language codes.

下面是一个修改过的函数,它返回一个标量值来代替原始逗号分隔的语言代码字符串。

The comments explain what is happening line by line.

评论解释了一行一行的情况。

The gist is that you must loop through the input string keeping track of the last comma location, extract each code, lookup the full language from the languages table, and then return the output as a comma-delimited string.

要点是您必须遍历输入字符串,跟踪最后一个逗号位置,提取每个代码,从语言表中查找完整语言,然后将输出作为逗号分隔的字符串返回。

Language codes to languages function:

语言代码到语言功能:

Create Function [dbo].fn_languageCodeToFull
    ( @Input Varchar(100) )
    Returns Varchar(1000)
As
Begin
    -- To address null input, based on the example you provided, we set the output to NULL if there is no input
    If @Input = '' Or @Input Is Null 
        Return Null

    Declare 
        @CodeLength int, -- constant for code length to avoid hardcoded "magic numbers"
        @Output varchar(1000), -- will contain the final comma delimited string of full languages
        @LastIndex int, -- tracks the location of the input we are searching as we loop over the string
        @CurrentCode varchar(2), -- for code readability, we extract each language code to this variable
        @CurrentLanguage varchar(50), -- for code readability, we store the full language in this variable
        @IndexIncrement int -- constant to increment the search index by 1 at each iteration
                            -- ensuring the loop moves forward

    Set @LastIndex = 0  -- seed the index, so we begin to search at 0 index 
    Set @CodeLength = 2 -- ISO language codes are always 2 characters in length
    Set @Output = '' -- seed with empty string to avoid NULL when concatenating
    Set @IndexIncrement = 1 -- again avoiding hardcoded values...

    -- We will loop until we have gone to or beyond the length of the input string
    While @LastIndex < len(@Input)
        Begin
            -- Set the index of each comma (charindex is 1-based)
            Set @LastIndex = CHARINDEX(',', @Input, @LastIndex)
            -- When we get to the last item, CharIndex will return 0 when it does not find a comma. 
            -- To pull the last item, we will artificially set @LastIndex to be 1 greater than the input string
            -- This will allow the code following this line to be unaltered for this scenario
            If @LastIndex = 0 set @LastIndex = len(@Input) + 1 -- account for 1-based index of substring
            -- Extract the code prior to the current comma that charindex has identified
            Set @CurrentCode = substring(@Input, @LastIndex - @CodeLength, @CodeLength)
            -- Do a lookup to get the language for the current code
            Set @CurrentLanguage = (Select LanguageName From languages Where code = @CurrentCode)
            -- Only add comma after first language to ensure no extra comma will be present in Output
            If @LastIndex > 3 Set @Output = @Output + ','
            -- Here we build the Output string with the language
            Set @Output = @Output + @CurrentLanguage

            -- Finally, we increment @LastIndex by 1 to avoid loop on first instance of comma
            Set @LastIndex = @LastIndex + @IndexIncrement
        End
    Return @Output
End

Then your view would simply do something like:

然后您的视图将执行以下操作:

Sample view using the function:

使用函数的示例视图:

Create View vw_UserLanguages
As
    Select 
        UserId,
        dbo.fn_languageCodeToFull(DefaultLanguage) as DefaultLanguage,                          
        dbo.fn_languageCodeToFull(OtherLanguages) as OtherLanguages,
    From UserLanguageCodes -- you do not provide a name so I made one up

Note that the function will work whether there are commas or not, so there is no need to join the Languages table here as you can just have the function do all the work in this case.

请注意,无论是否有逗号,该函数都将起作用,因此不需要在此处加入Languages表,因为您可以让函数在这种情况下完成所有工作。

#3


1  

One quick and dirty solution would be to use a nested REPLACE command but that could result in a very complex statement a bit long winded, especially if you have more than five languages.

一个快速而肮脏的解决方案是使用嵌套的REPLACE命令,但这可能导致一个非常复杂的语句有点长,特别是如果你有超过五种语言。

As an example:

举个例子:

SELECT [UserId],[DefaultLanguage],
CASE 
  WHEN [OtherLanguages] IS NULL THEN ''
  ELSE REPLACE(
    REPLACE(
    REPLACE(
    REPLACE(
    REPLACE([OtherLanguages],
    'en','English'),
    'fr','French'),
    'it','Italian'),
    'ro','Romulan'), --Probably not the intended language ;-)
    'sp','Spanish')
END as [OtherLanguages]  
FROM YourTable

Personally, I'd create a scalar function, again using the REPLACE command, but you can then check the number of languages present and add a counter so that you're not doing unnecessary lookups.

就个人而言,我再次使用REPLACE命令创建一个标量函数,但是您可以检查存在的语言数量并添加一个计数器,这样您就不会进行不必要的查找。

SELECT [UserId],[DefaultLanguage],
CASE 
  WHEN [OtherLanguages] IS NULL THEN ''
  WHEN [OtherLanguages] = '' THEN ''
  ELSE do_function_name([OtherLanguages])
END as [OtherLanguages]  
FROM YourTable

It might not be good practice but there are times when it is more efficient to store multiple values in a single field but accept that when you do, it will slow down the way you handle that data.

这可能不是一个好习惯,但有时候在单个字段中存储多个值会更有效,但是当你这样做时,它会减慢处理数据的速度。

#1


2  

Several solutions of course you can recreate all table change the data structure. 1. If all the language are 2 digits:

几种解决方案当然可以重新创建所有表来更改数据结构。 1.如果所有语言都是2位数:

select t1.UserId, t2.LanguageName, 
ISNULL( t3.LanguageName, '') + ISNULL(', '+t4.LanguageName, '') + ISNULL( ', '+t5.LanguageName, '') OtherLanguages
from Table1 t1 
inner join Table2 t2 on t1.DefaultLanguage = t2.LangCode
left join Table2 t3 on Left(t1.OtherLanguages,2) = t3.LangCode
left join Table2 t4 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 3 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 4, 2) ELSE null END = t4.LangCode
left join Table2 t5 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 6 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 7, 2) ELSE null END = t5.LangCode
  1. Use user-define function:
  2. 使用用户定义函数:

CREATE FUNCTION [dbo].[func_GetLanguageName] (@pLanguageList varchar(max))

CREATE FUNCTION [dbo]。[func_GetLanguageName](@ pLanguageList varchar(max))

RETURNS varchar(max) AS

RETURNS varchar(max)AS

BEGIN

开始

Declare @aLanguageList varchar(max) = @pLanguageList
Declare @aLangCode varchar(max) = null
Declare @aReturnName varchar(max) = null
WHILE LEN(@aLanguageList) > 0
BEGIN
    IF PATINDEX('%,%',@aLanguageList) > 0
    BEGIN
        SET @aLangCode = RTRIM(LTRIM(SUBSTRING(@aLanguageList, 0, PATINDEX('%,%',@aLanguageList))))
        SET @aLanguageList = LTRIM(SUBSTRING(@aLanguageList, LEN(@aLangCode + ',') + 1,LEN(@aLanguageList)))
    END
    ELSE
    BEGIN
        SET @aLangCode = @aLanguageList
        SET @aLanguageList = NULL
    END
    Select @aReturnName = ISNULL( @aReturnName + ', ' , '') + LanguageName from Table2 where LangCode=@aLangCode
END
RETURN(@aReturnName)

END

结束

and use select

并使用选择

select UserId, dbo.func_GetLanguageName(DefaultLanguage)DefaultLanguage, dbo.func_GetLanguageName(OtherLanguages) OtherLanguages from table1

#2


1  

Best practice would dictate not to have this type of comma delimited data in a column...

最佳实践将要求不在列中使用此类逗号分隔数据...

Since you stated in comments that the schema cannot be changed, the next best thing is a function. This can be used in a select query in-line.

由于您在评论中声明无法更改架构,因此下一个最好的功能是功能。这可以在内联的选择查询中使用。

SQL is notoriously slow with string manipulation. Here is an interesting article on the topic. There are many SQL "string split" functions out there. They all generally split a comma delimited string and return a table.

SQL字符串操作非常慢。这是一篇关于这个主题的有趣文章。那里有许多SQL“字符串拆分”功能。它们通常都会分割逗号分隔的字符串并返回一个表。

For this specific use-case, you actually need a scalar-valued function (a function which returns one value) rather than a table-valued function (one which returns a table of values).

对于这个特定的用例,您实际上需要一个标量值函数(一个返回一个值的函数)而不是一个表值函数(一个返回值表的函数)。

Below is a modified such function, which returns a scalar value in place of the original comma delimited string of language codes.

下面是一个修改过的函数,它返回一个标量值来代替原始逗号分隔的语言代码字符串。

The comments explain what is happening line by line.

评论解释了一行一行的情况。

The gist is that you must loop through the input string keeping track of the last comma location, extract each code, lookup the full language from the languages table, and then return the output as a comma-delimited string.

要点是您必须遍历输入字符串,跟踪最后一个逗号位置,提取每个代码,从语言表中查找完整语言,然后将输出作为逗号分隔的字符串返回。

Language codes to languages function:

语言代码到语言功能:

Create Function [dbo].fn_languageCodeToFull
    ( @Input Varchar(100) )
    Returns Varchar(1000)
As
Begin
    -- To address null input, based on the example you provided, we set the output to NULL if there is no input
    If @Input = '' Or @Input Is Null 
        Return Null

    Declare 
        @CodeLength int, -- constant for code length to avoid hardcoded "magic numbers"
        @Output varchar(1000), -- will contain the final comma delimited string of full languages
        @LastIndex int, -- tracks the location of the input we are searching as we loop over the string
        @CurrentCode varchar(2), -- for code readability, we extract each language code to this variable
        @CurrentLanguage varchar(50), -- for code readability, we store the full language in this variable
        @IndexIncrement int -- constant to increment the search index by 1 at each iteration
                            -- ensuring the loop moves forward

    Set @LastIndex = 0  -- seed the index, so we begin to search at 0 index 
    Set @CodeLength = 2 -- ISO language codes are always 2 characters in length
    Set @Output = '' -- seed with empty string to avoid NULL when concatenating
    Set @IndexIncrement = 1 -- again avoiding hardcoded values...

    -- We will loop until we have gone to or beyond the length of the input string
    While @LastIndex < len(@Input)
        Begin
            -- Set the index of each comma (charindex is 1-based)
            Set @LastIndex = CHARINDEX(',', @Input, @LastIndex)
            -- When we get to the last item, CharIndex will return 0 when it does not find a comma. 
            -- To pull the last item, we will artificially set @LastIndex to be 1 greater than the input string
            -- This will allow the code following this line to be unaltered for this scenario
            If @LastIndex = 0 set @LastIndex = len(@Input) + 1 -- account for 1-based index of substring
            -- Extract the code prior to the current comma that charindex has identified
            Set @CurrentCode = substring(@Input, @LastIndex - @CodeLength, @CodeLength)
            -- Do a lookup to get the language for the current code
            Set @CurrentLanguage = (Select LanguageName From languages Where code = @CurrentCode)
            -- Only add comma after first language to ensure no extra comma will be present in Output
            If @LastIndex > 3 Set @Output = @Output + ','
            -- Here we build the Output string with the language
            Set @Output = @Output + @CurrentLanguage

            -- Finally, we increment @LastIndex by 1 to avoid loop on first instance of comma
            Set @LastIndex = @LastIndex + @IndexIncrement
        End
    Return @Output
End

Then your view would simply do something like:

然后您的视图将执行以下操作:

Sample view using the function:

使用函数的示例视图:

Create View vw_UserLanguages
As
    Select 
        UserId,
        dbo.fn_languageCodeToFull(DefaultLanguage) as DefaultLanguage,                          
        dbo.fn_languageCodeToFull(OtherLanguages) as OtherLanguages,
    From UserLanguageCodes -- you do not provide a name so I made one up

Note that the function will work whether there are commas or not, so there is no need to join the Languages table here as you can just have the function do all the work in this case.

请注意,无论是否有逗号,该函数都将起作用,因此不需要在此处加入Languages表,因为您可以让函数在这种情况下完成所有工作。

#3


1  

One quick and dirty solution would be to use a nested REPLACE command but that could result in a very complex statement a bit long winded, especially if you have more than five languages.

一个快速而肮脏的解决方案是使用嵌套的REPLACE命令,但这可能导致一个非常复杂的语句有点长,特别是如果你有超过五种语言。

As an example:

举个例子:

SELECT [UserId],[DefaultLanguage],
CASE 
  WHEN [OtherLanguages] IS NULL THEN ''
  ELSE REPLACE(
    REPLACE(
    REPLACE(
    REPLACE(
    REPLACE([OtherLanguages],
    'en','English'),
    'fr','French'),
    'it','Italian'),
    'ro','Romulan'), --Probably not the intended language ;-)
    'sp','Spanish')
END as [OtherLanguages]  
FROM YourTable

Personally, I'd create a scalar function, again using the REPLACE command, but you can then check the number of languages present and add a counter so that you're not doing unnecessary lookups.

就个人而言,我再次使用REPLACE命令创建一个标量函数,但是您可以检查存在的语言数量并添加一个计数器,这样您就不会进行不必要的查找。

SELECT [UserId],[DefaultLanguage],
CASE 
  WHEN [OtherLanguages] IS NULL THEN ''
  WHEN [OtherLanguages] = '' THEN ''
  ELSE do_function_name([OtherLanguages])
END as [OtherLanguages]  
FROM YourTable

It might not be good practice but there are times when it is more efficient to store multiple values in a single field but accept that when you do, it will slow down the way you handle that data.

这可能不是一个好习惯,但有时候在单个字段中存储多个值会更有效,但是当你这样做时,它会减慢处理数据的速度。