从给定的字符串中提取子字符串

时间:2022-09-13 13:34:48

I have a string here, This is a string: AAA123456789.

这里有一个字符串,这是一个字符串:AAA123456789。

So the idea here is to extract the string AAA123456789 using regex.

这里的想法是使用正则表达式提取字符串AAA123456789。

I am incorporating this with X-Path.

我把它和X-Path合并。

Note: If there is a post to this, kindly lead me to it.

注意:如果有帖子的话,请带我去。

I think, by right, I should substring(myNode, [^AAA\d+{9}]),

对吧,我想我应该substring(myNode,[^ 3 a \ d + { 9 })),

I am not really sure bout the regex part.

我对regex部分不是很确定。

The idea is to extract the string when met with "AAA" and only numbers but 9 consequent numbers only.

这个想法是在遇到“AAA”时提取字符串,只在数字上加上9个结果。

4 个解决方案

#1


9  

Pure XPath solution:

纯XPath的解决方案:

substring-after('This is a string: AAA123456789', ': ')

produces:

生产:

AAA123456789

XPath 2.0 solutions:

XPath 2.0解决方案:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[starts-with(., 'AAA')]

or:

或者:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[matches(., 'AAA\d+')]

or:

或者:

replace('This is a string: AAA123456789 but not an double',
              '^.*(A+\d+).*$',
              '$1'
              )

#2


4  

Alright, after referencing answers and comments by wonderful people here, I summarized my findings with this solution which I opted for. Here goes,

好吧,在引用了这里优秀人士的回答和评论之后,我用我选择的这个解决方案总结了我的发现。在这里,

concat("AAA", substring(substring-after(., "AAA"), 1, 9)).

concat(substring(substring-after(“AAA”。“AAA”),1 9))。

So I firstly, substring-after the string with "AAA" as the 1st argument, with the length of 1 to 9...anything more, is ignored. Then since I used the AAA as a reference, this will not appear, thus, concatenating AAA to the front of the value. So this means that I will get the 1st 9 digits after AAA and then concat AAA in front since its a static data.

所以我首先,在字符串后面加上“AAA”作为第一个参数,长度为1到9……任何更多的,将被忽略。然后,由于我使用AAA作为引用,这将不会出现,从而将AAA连接到值的前面。这就意味着我将得到AAA之后的前9位数字,然后再把AAA放在前面因为它是静态数据。

This will allow the data to be correct no matter what other contributions there is.

这将允许数据是正确的,无论还有什么贡献。

But I like the regex by @Dimitre. The replace part. The tokenize not so as what if there isn't space as the argument. The replace with regex, this is also wonderful. Thanks.

但是我喜欢@Dimitre的regex。替换的部分。符号不是,如果没有空间就像参数。用regex代替,这也很棒。谢谢。

And also thanks to you guys out there to...

还要感谢你们…

#3


1  

First, I'm pretty sure you don't mean to have the [^ ... ]. That defines a "negative character class", i.e. your current regex says, "Give me a single character that is not one of the following: A0123456789{}". You probably meant, plainly, "AAA(\d{9})". Now, according to this handy website, XPath does support capture groups, as well as backreferences, so take your pick:

首先,我敢肯定你不想有[^……]。它定义了一个“负字符类”,即您当前的regex说,“给我一个不属于以下内容的字符:A0123456789{}”。你可能只是简单地说“AAA(\d{9})”。现在,根据这个方便的网站,XPath确实支持捕获组和反向引用,所以您可以选择:

"AAA(\d{9})"

And extracting $1, the first capture group, or:

提取第一个捕获组$1,或者:

"(?<=AAA)\d{9}"

And taking the whole match ($0).

取整个匹配项(0美元)

#4


0  

Can you try this :

你能试试这个吗?

A{3}(\d{9})

{ 3 }(\ d { 9 })

#1


9  

Pure XPath solution:

纯XPath的解决方案:

substring-after('This is a string: AAA123456789', ': ')

produces:

生产:

AAA123456789

XPath 2.0 solutions:

XPath 2.0解决方案:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[starts-with(., 'AAA')]

or:

或者:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[matches(., 'AAA\d+')]

or:

或者:

replace('This is a string: AAA123456789 but not an double',
              '^.*(A+\d+).*$',
              '$1'
              )

#2


4  

Alright, after referencing answers and comments by wonderful people here, I summarized my findings with this solution which I opted for. Here goes,

好吧,在引用了这里优秀人士的回答和评论之后,我用我选择的这个解决方案总结了我的发现。在这里,

concat("AAA", substring(substring-after(., "AAA"), 1, 9)).

concat(substring(substring-after(“AAA”。“AAA”),1 9))。

So I firstly, substring-after the string with "AAA" as the 1st argument, with the length of 1 to 9...anything more, is ignored. Then since I used the AAA as a reference, this will not appear, thus, concatenating AAA to the front of the value. So this means that I will get the 1st 9 digits after AAA and then concat AAA in front since its a static data.

所以我首先,在字符串后面加上“AAA”作为第一个参数,长度为1到9……任何更多的,将被忽略。然后,由于我使用AAA作为引用,这将不会出现,从而将AAA连接到值的前面。这就意味着我将得到AAA之后的前9位数字,然后再把AAA放在前面因为它是静态数据。

This will allow the data to be correct no matter what other contributions there is.

这将允许数据是正确的,无论还有什么贡献。

But I like the regex by @Dimitre. The replace part. The tokenize not so as what if there isn't space as the argument. The replace with regex, this is also wonderful. Thanks.

但是我喜欢@Dimitre的regex。替换的部分。符号不是,如果没有空间就像参数。用regex代替,这也很棒。谢谢。

And also thanks to you guys out there to...

还要感谢你们…

#3


1  

First, I'm pretty sure you don't mean to have the [^ ... ]. That defines a "negative character class", i.e. your current regex says, "Give me a single character that is not one of the following: A0123456789{}". You probably meant, plainly, "AAA(\d{9})". Now, according to this handy website, XPath does support capture groups, as well as backreferences, so take your pick:

首先,我敢肯定你不想有[^……]。它定义了一个“负字符类”,即您当前的regex说,“给我一个不属于以下内容的字符:A0123456789{}”。你可能只是简单地说“AAA(\d{9})”。现在,根据这个方便的网站,XPath确实支持捕获组和反向引用,所以您可以选择:

"AAA(\d{9})"

And extracting $1, the first capture group, or:

提取第一个捕获组$1,或者:

"(?<=AAA)\d{9}"

And taking the whole match ($0).

取整个匹配项(0美元)

#4


0  

Can you try this :

你能试试这个吗?

A{3}(\d{9})

{ 3 }(\ d { 9 })