I want to extract a word from a string column of a table.
我想从表的字符串列中提取一个词。
description
===========================
abc order_id: 2 xxxx yyy aa
mmm order_id: 3 nn kk yw
Expected result set
预期的结果集
order_id
===========================
2
3
Table will at most have 100 rows, text length is ~256 char and column always has one order_id
present. So performance is not an issue.
表最多有100行,文本长度为~256字符,列总是有一个order_id。所以性能不是问题。
In Oracle, I can use REGEXP_SUBSTR
for this problem. How would I solve this in MySQL?
在Oracle中,我可以对这个问题使用REGEXP_SUBSTR。如何用MySQL解决这个问题?
Edit 1
编辑1
I am using LOCATE and SUBSTR to solve the problem. The code is ugly. Ten minutes after writing the code, I am cursing the guy who wrote such an ugly code.
我正在使用LOCATE and SUBSTR来解决这个问题。代码是丑陋的。写完代码十分钟后,我诅咒写了这么难看的代码的家伙。
I didn't find the REGEXP_SUBSTR function in MySQL docs. But I am hoping that it exists..
我没有在MySQL文档中找到REGEXP_SUBSTR函数。但我希望它存在。
Answer to : Why cant the table be optimized? Why is the data stored in such a dumb fashion?
答:为什么不能对表进行优化?为什么数据以如此愚蠢的方式存储?
The example I gave just denotes the problem I am trying to solve. In real scenario, I am using a DB based 3rd party queuing software for executing asynchronous tasks. The queue serializes the Ruby object as text. I have no control over the table structure OR the data format. The tasks in the queue can be recurring. In our test setup, some of the recurring tasks are failing because of stale data. I have to delete these tasks to prevent the error. Such errors are not common, hence I don't want to maintain a normalized shadow table.
我给出的例子就是我要解决的问题。在实际场景中,我使用基于DB的第三方排队软件来执行异步任务。队列将Ruby对象序列化为文本。我无法控制表结构或数据格式。队列中的任务可以循环使用。在我们的测试设置中,一些重复的任务由于陈旧的数据而失败。我必须删除这些任务以避免错误。这样的错误并不常见,因此我不想维护一个规范化的影子表。
4 个解决方案
#1
1
"I didn't find the REGEXP_SUBSTR function in MySQL docs. But I am hoping that it exists.."
我没有在MySQL文档中找到REGEXP_SUBSTR函数。但我希望它存在。
Yes, starting from MySQL 8.0 it is supported. Regular Expressions:
是的,从MySQL 8.0开始支持。正则表达式:
REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])
Returns the substring of the string expr that matches the regular expression specified by the pattern pat, NULL if there is no match. If expr or pat is NULL, the return value is NULL.
返回与模式pat指定的正则表达式匹配的字符串expr的子字符串,如果没有匹配,则为NULL。如果expr或pat为空,返回值为空。
#2
4
Like Konerak said, there is no equivalent of REGEXP_SUBSTR in MySql. You could do what you need using SUBSTRING logic, but it is ugly :
正如Konerak所说,在MySql中没有等价的REGEXP_SUBSTR。你可以使用子字符串逻辑做你需要做的事情,但它很丑:
SELECT
SUBSTRING(lastPart.end, 1, LOCATE(' ', lastPart.end) - 1) AS orderId
FROM
(
SELECT
SUBSTRING(dataset.description, LOCATE('order_id: ', dataset.description) + LENGTH('order_id: ')) AS end
FROM
(
SELECT 'abc order_id: 2 xxxx yyy aa' AS description
UNION SELECT 'mmm order_id: 3 nn kk yw' AS description
UNION SELECT 'mmm order_id: 1523 nn kk yw' AS description
) AS dataset
) AS lastPart
Edit: You could try this user defined function providing access to perl regex in MySql
编辑:您可以尝试使用这个用户定义的函数,在MySql中提供对perl regex的访问。
SELECT
PREG_CAPTURE( '/.*order_id:\s(\d+).*/', dataset.description,1)
FROM
(
SELECT 'abc order_id: 2 xxxx yyy aa' AS description
UNION SELECT 'mmm order_id: 3 nn kk yw' AS description
UNION SELECT 'mmm order_id: 1523 nn kk yw' AS description
) AS dataset
#3
1
There is no MySQL equivalent. The MySQL REGEXP can be used for matching strings, but not for transforming them.
没有MySQL对等的。MySQL REGEXP可以用于匹配字符串,但不能用于转换字符串。
You can either try to work with stored procedures and a lot of REPLACE/SUBSTRING logic, or do it in your programming language - which should be the easiest option.
您可以尝试使用存储过程和大量替换/子字符串逻辑,或者使用您的编程语言—这应该是最简单的选项。
But are you sure your data format is well chosen? If you need the order_id, wouldn't it make sense to store it in a different column, so you can put indexes, use joins and the likes?
但是您确定您的数据格式是精心选择的吗?如果您需要order_id,那么将它存储在另一个列不是很有意义吗?
#4
0
or you can do this and save yourself the ugliness :
或者你也可以这样做,把你自己拯救出来。
select SUBSTRING_INDEX(SUBSTRING_INDEX('habc order_id: 2 xxxx yyy aa',' ',3),' ',-1);
#1
1
"I didn't find the REGEXP_SUBSTR function in MySQL docs. But I am hoping that it exists.."
我没有在MySQL文档中找到REGEXP_SUBSTR函数。但我希望它存在。
Yes, starting from MySQL 8.0 it is supported. Regular Expressions:
是的,从MySQL 8.0开始支持。正则表达式:
REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])
Returns the substring of the string expr that matches the regular expression specified by the pattern pat, NULL if there is no match. If expr or pat is NULL, the return value is NULL.
返回与模式pat指定的正则表达式匹配的字符串expr的子字符串,如果没有匹配,则为NULL。如果expr或pat为空,返回值为空。
#2
4
Like Konerak said, there is no equivalent of REGEXP_SUBSTR in MySql. You could do what you need using SUBSTRING logic, but it is ugly :
正如Konerak所说,在MySql中没有等价的REGEXP_SUBSTR。你可以使用子字符串逻辑做你需要做的事情,但它很丑:
SELECT
SUBSTRING(lastPart.end, 1, LOCATE(' ', lastPart.end) - 1) AS orderId
FROM
(
SELECT
SUBSTRING(dataset.description, LOCATE('order_id: ', dataset.description) + LENGTH('order_id: ')) AS end
FROM
(
SELECT 'abc order_id: 2 xxxx yyy aa' AS description
UNION SELECT 'mmm order_id: 3 nn kk yw' AS description
UNION SELECT 'mmm order_id: 1523 nn kk yw' AS description
) AS dataset
) AS lastPart
Edit: You could try this user defined function providing access to perl regex in MySql
编辑:您可以尝试使用这个用户定义的函数,在MySql中提供对perl regex的访问。
SELECT
PREG_CAPTURE( '/.*order_id:\s(\d+).*/', dataset.description,1)
FROM
(
SELECT 'abc order_id: 2 xxxx yyy aa' AS description
UNION SELECT 'mmm order_id: 3 nn kk yw' AS description
UNION SELECT 'mmm order_id: 1523 nn kk yw' AS description
) AS dataset
#3
1
There is no MySQL equivalent. The MySQL REGEXP can be used for matching strings, but not for transforming them.
没有MySQL对等的。MySQL REGEXP可以用于匹配字符串,但不能用于转换字符串。
You can either try to work with stored procedures and a lot of REPLACE/SUBSTRING logic, or do it in your programming language - which should be the easiest option.
您可以尝试使用存储过程和大量替换/子字符串逻辑,或者使用您的编程语言—这应该是最简单的选项。
But are you sure your data format is well chosen? If you need the order_id, wouldn't it make sense to store it in a different column, so you can put indexes, use joins and the likes?
但是您确定您的数据格式是精心选择的吗?如果您需要order_id,那么将它存储在另一个列不是很有意义吗?
#4
0
or you can do this and save yourself the ugliness :
或者你也可以这样做,把你自己拯救出来。
select SUBSTRING_INDEX(SUBSTRING_INDEX('habc order_id: 2 xxxx yyy aa',' ',3),' ',-1);