将分隔的字符串值拆分为行

时间:2021-06-29 22:07:31

Some external data vendor wants to give me a data field - pipe delimited string value, which I find quite difficult to deal with.

一些外部数据供应商想给我一个数据字段 - 管道分隔的字符串值,我觉得很难处理。

Without help from an application programming language, is there a way to transform the string value into rows?

没有应用程序编程语言的帮助,有没有办法将字符串值转换为行?

There is a difficulty however, the field has unknown number of delimited elements.

然而,存在一个困难,该字段具有未知数量的分隔元素。

DB engine in question is MySQL.

有问题的数据库引擎是MySQL。

For example:

例如:

Input: Tuple(1, "a|b|c")

Output:

Tuple(1, "a")
Tuple(1, "b")
Tuple(1, "c")

3 个解决方案

#1


3  

It may not be as difficult as I initially thought.

它可能没有我最初想象的那么困难。

This is a general approach:

这是一种通用方法:

  1. Count number of occurrences of the delimiter length(val) - length(replace(val, '|', ''))
  2. 计算分隔符长度(val)的出现次数 - 长度(替换(val,'|',''))
  3. Loop a number of times, each time grab a new delimited value and insert the value to a second table.
  4. 循环多次,每次抓取一个新的分隔值并将值插入第二个表。

#2


3  

Use this function by Federico Cargnelutti:

使用Federico Cargnelutti的这个功能:

 CREATE FUNCTION SPLIT_STR(
 x VARCHAR(255),
 delim VARCHAR(12),
 pos INT
 )
   RETURNS VARCHAR(255)
   RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');

Usage

用法

 SELECT SPLIT_STR(string, delimiter, position)

you will need a loop to solve your problem.

你需要一个循环来解决你的问题。

#3


0  

Although your issue is probably long time resolved, I was looking for a solution to the very same problem you had. I solved it with the help of a procedure referenced here with slight adaptions to serve multi-byte characters (such as the German Umlauts) in the string by using CHAR_LENGTH() instead of LENGTH().

虽然您的问题可能已经很长时间了,但我一直在寻找解决您遇到的同样问题的方法。我在这里引用的过程的帮助下解决了它,稍微适应了使用CHAR_LENGTH()而不是LENGTH()来在字符串中提供多字节字符(例如德语变音符号)。

DELIMITER $$
    CREATE FUNCTION SPLIT_STRING(val TEXT, delim VARCHAR(12), pos INT) RETURNS TEXT
    BEGIN
        DECLARE output TEXT;
        SET output = REPLACE(SUBSTRING(SUBSTRING_INDEX(val, delim, pos), CHAR_LENGTH(SUBSTRING_INDEX(val, delim, pos - 1)) + 1), delim, '');
        IF output = '' THEN
            SET output = null;
        END IF;
        RETURN output;
    END $$

    CREATE PROCEDURE TRANSFER_CELL()
    BEGIN
        DECLARE i INTEGER;
        SET i = 1;
        REPEAT
            INSERT INTO NewTuple (id, value)
            SELECT id, SPLIT_STRING(value, '|', i)
            FROM Tuple
            WHERE SPLIT_STRING(value, '|', i) IS NOT NULL;
            SET i = i + 1;
        UNTIL ROW_COUNT() = 0
        END REPEAT;
    END $$
DELIMITER ;

CALL TRANSFER_CELL() ;

DROP FUNCTION SPLIT_STRING ;
DROP PROCEDURE TRANSFER_CELL ;

#1


3  

It may not be as difficult as I initially thought.

它可能没有我最初想象的那么困难。

This is a general approach:

这是一种通用方法:

  1. Count number of occurrences of the delimiter length(val) - length(replace(val, '|', ''))
  2. 计算分隔符长度(val)的出现次数 - 长度(替换(val,'|',''))
  3. Loop a number of times, each time grab a new delimited value and insert the value to a second table.
  4. 循环多次,每次抓取一个新的分隔值并将值插入第二个表。

#2


3  

Use this function by Federico Cargnelutti:

使用Federico Cargnelutti的这个功能:

 CREATE FUNCTION SPLIT_STR(
 x VARCHAR(255),
 delim VARCHAR(12),
 pos INT
 )
   RETURNS VARCHAR(255)
   RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');

Usage

用法

 SELECT SPLIT_STR(string, delimiter, position)

you will need a loop to solve your problem.

你需要一个循环来解决你的问题。

#3


0  

Although your issue is probably long time resolved, I was looking for a solution to the very same problem you had. I solved it with the help of a procedure referenced here with slight adaptions to serve multi-byte characters (such as the German Umlauts) in the string by using CHAR_LENGTH() instead of LENGTH().

虽然您的问题可能已经很长时间了,但我一直在寻找解决您遇到的同样问题的方法。我在这里引用的过程的帮助下解决了它,稍微适应了使用CHAR_LENGTH()而不是LENGTH()来在字符串中提供多字节字符(例如德语变音符号)。

DELIMITER $$
    CREATE FUNCTION SPLIT_STRING(val TEXT, delim VARCHAR(12), pos INT) RETURNS TEXT
    BEGIN
        DECLARE output TEXT;
        SET output = REPLACE(SUBSTRING(SUBSTRING_INDEX(val, delim, pos), CHAR_LENGTH(SUBSTRING_INDEX(val, delim, pos - 1)) + 1), delim, '');
        IF output = '' THEN
            SET output = null;
        END IF;
        RETURN output;
    END $$

    CREATE PROCEDURE TRANSFER_CELL()
    BEGIN
        DECLARE i INTEGER;
        SET i = 1;
        REPEAT
            INSERT INTO NewTuple (id, value)
            SELECT id, SPLIT_STRING(value, '|', i)
            FROM Tuple
            WHERE SPLIT_STRING(value, '|', i) IS NOT NULL;
            SET i = i + 1;
        UNTIL ROW_COUNT() = 0
        END REPEAT;
    END $$
DELIMITER ;

CALL TRANSFER_CELL() ;

DROP FUNCTION SPLIT_STRING ;
DROP PROCEDURE TRANSFER_CELL ;