使用大型查询/谷歌分析的Regexp_extract从字符A提取子字符串到字符B或EOL

时间:2021-09-29 15:24:55

I'm working with Google Big Query and try to extract some information from a string column into another column using Regexp_extract. In short:

我正在使用谷歌大查询,并尝试使用Regexp_extract将字符串列中的一些信息提取到另一个列中。简而言之:

Data in myVariable:

在myVariable数据:

yippie/eggs-spam/?portlet:hungry=1234
yippie/eggs-spam/?portlet:hungry=456&portlet:hungrier=7890

I want a column with:

我想要一个专栏:

1234
456

My command:

我的命令:

SELECT Regexp_extract(myVariable, r'SOME_MAGIC') as result
FROM table

I tried for SOME_MAGIC:

我试着SOME_MAGIC:

hungry=(.*)[&$] - null, 456 (I learned that $ is interpreted as is)
hungry=(.*)(&|$) - Error: Exactly one capturing group must be specified
hungry=(.*)^& - null, null
hungry=(&.*)?$ - null, null

I read this, but there the number has a fixed length. Also looked at this, but "?=" is no known command for perl.

我读了这个,但是这个数字有一个固定的长度。也看着这个,但是"?="是perl的已知命令。

Does anybody have an idea? Thank you in advance!

有人知道吗?提前谢谢你!

2 个解决方案

#1


1  

I just found an answer to how I can solve my problem differently:

我刚刚找到了一个解决问题的方法:

hungry=([0-9]+) - 1234, 456

It isn't an answer to my abstract question (regex for selecting Charater A to [Character B or EOL]), so it's not that satisfying. E.g. it won't work with

它不是我抽象问题的答案(regex用于选择Charater A到[Character B或EOL]),所以它不是那么令人满意。这行不通

yippie/eggs-spam/?portlet:hungry=12AB34

However my original problem is solved. I leave the question open for a while in case somebody has a better answer.

然而,我最初的问题得到了解决。我把这个问题留一段时间,以防有人有更好的答案。

#2


1  

I think I had a similar problem were I was trying to select the last 6 characters in a string (link_id) as a new column.

我想我有一个类似的问题,我试图选择一个字符串中的最后6个字符(link_id)作为一个新的列。

I kept getting this error:

我一直犯这样的错误:

Exactly one capturing group must be specified

必须指定一个捕获组

My code originally was:

最初我的代码是:

SELECT
...
REGEXP_EXTRACT(link_id, r'......$') AS updated_link_id
FROM sometable;

To get rid of the error and retrieve the correct substring as a column, I had to add parentheses around my regex string.

为了消除错误并以列形式检索正确的子字符串,我必须在regex字符串周围添加圆括号。

SELECT
...
REGEXP_EXTRACT(link_id, r'(......$)') AS updated_link_id
FROM sometable;

#1


1  

I just found an answer to how I can solve my problem differently:

我刚刚找到了一个解决问题的方法:

hungry=([0-9]+) - 1234, 456

It isn't an answer to my abstract question (regex for selecting Charater A to [Character B or EOL]), so it's not that satisfying. E.g. it won't work with

它不是我抽象问题的答案(regex用于选择Charater A到[Character B或EOL]),所以它不是那么令人满意。这行不通

yippie/eggs-spam/?portlet:hungry=12AB34

However my original problem is solved. I leave the question open for a while in case somebody has a better answer.

然而,我最初的问题得到了解决。我把这个问题留一段时间,以防有人有更好的答案。

#2


1  

I think I had a similar problem were I was trying to select the last 6 characters in a string (link_id) as a new column.

我想我有一个类似的问题,我试图选择一个字符串中的最后6个字符(link_id)作为一个新的列。

I kept getting this error:

我一直犯这样的错误:

Exactly one capturing group must be specified

必须指定一个捕获组

My code originally was:

最初我的代码是:

SELECT
...
REGEXP_EXTRACT(link_id, r'......$') AS updated_link_id
FROM sometable;

To get rid of the error and retrieve the correct substring as a column, I had to add parentheses around my regex string.

为了消除错误并以列形式检索正确的子字符串,我必须在regex字符串周围添加圆括号。

SELECT
...
REGEXP_EXTRACT(link_id, r'(......$)') AS updated_link_id
FROM sometable;