Does anyone know how to match even numbers and odd numbers of letter using regexp in mysql? i need to match like a even number of A's followed by an odd number of G's and then at least one TC? For example: acgtccAAAAGGGTCatg would match up. It's something for dna sequencing
有人知道如何使用regexp匹配mysql中的偶数和奇数吗?我需要匹配一个偶数a后面跟着一个奇数G然后至少一个TC?例如:acgtccAAAAGGGTCatg将匹配。这是dna测序的东西。
2 个解决方案
#1
23
An even number of A's can be expressed as (AA)+
(one or more instance of AA
; so it'll match AA, AAAA, AAAAAA...). An odd number of Gs can be expressed as G(GG)*
(one G
followed by zero or more instances of GG
, so that'll match G, GGG, GGGGG...).
偶数个A可以表示为(AA)+ (AA的一个或多个实例;所以它会匹配AA, AAAA, AAAAAA……奇数个G可以表示为G(GG)*(1个G后面是0个或多个GG实例,这样就可以匹配G、GGG、GGG…)。
Put that together and you've got:
把它们放在一起,你就得到:
/(AA)+G(GG)*TC/
However, since regex engines will try to match as much as possible, this expression will actually match a substring of AAAGGGTC
(ie. AAGGGTC
)! In order to prevent that, you could use a negative lookbehind to ensure that the character before the first A
isn't another A
:
然而,由于regex引擎将尽可能地匹配,这个表达式实际上将匹配AAAGGGTC(即AAAGGGTC)的子字符串。AAGGGTC)!为了防止出现这种情况,你可以使用一个负面的后视镜来确保第一个a之前的字符不是另一个a:
/(?<!A)(AA)+G(GG)*TC/
...except that MySQL doesn't support lookarounds in their regexes.
…不过MySQL不支持正则表达式中的查找。
What you can do instead is specify that the pattern either starts at the beginning of the string (anchored by ^
), or is preceded by a character that's not A:
你可以做的是指定的模式从字符串的开始(由^),或者是之前不是一个一个字符:
/(^|[^A])(AA)+G(GG)*TC/
But note that with this pattern an extra character will be captured if the pattern isn't found at the start of the string so you'll have to chop of the first character if it's not an A.
但是请注意,如果这个模式在字符串的开头没有找到,那么将会捕获一个额外的字符,如果它不是A,那么您将不得不删除第一个字符。
#2
1
You can maybe try something like (AA)*(GG)*GTC
你可以试试(AA)*(GG)*GTC
I think that would do the trick. Don't know if there's a special syntax for mysql though
我想这样就可以了。不知道mysql是否有特殊的语法
#1
23
An even number of A's can be expressed as (AA)+
(one or more instance of AA
; so it'll match AA, AAAA, AAAAAA...). An odd number of Gs can be expressed as G(GG)*
(one G
followed by zero or more instances of GG
, so that'll match G, GGG, GGGGG...).
偶数个A可以表示为(AA)+ (AA的一个或多个实例;所以它会匹配AA, AAAA, AAAAAA……奇数个G可以表示为G(GG)*(1个G后面是0个或多个GG实例,这样就可以匹配G、GGG、GGG…)。
Put that together and you've got:
把它们放在一起,你就得到:
/(AA)+G(GG)*TC/
However, since regex engines will try to match as much as possible, this expression will actually match a substring of AAAGGGTC
(ie. AAGGGTC
)! In order to prevent that, you could use a negative lookbehind to ensure that the character before the first A
isn't another A
:
然而,由于regex引擎将尽可能地匹配,这个表达式实际上将匹配AAAGGGTC(即AAAGGGTC)的子字符串。AAGGGTC)!为了防止出现这种情况,你可以使用一个负面的后视镜来确保第一个a之前的字符不是另一个a:
/(?<!A)(AA)+G(GG)*TC/
...except that MySQL doesn't support lookarounds in their regexes.
…不过MySQL不支持正则表达式中的查找。
What you can do instead is specify that the pattern either starts at the beginning of the string (anchored by ^
), or is preceded by a character that's not A:
你可以做的是指定的模式从字符串的开始(由^),或者是之前不是一个一个字符:
/(^|[^A])(AA)+G(GG)*TC/
But note that with this pattern an extra character will be captured if the pattern isn't found at the start of the string so you'll have to chop of the first character if it's not an A.
但是请注意,如果这个模式在字符串的开头没有找到,那么将会捕获一个额外的字符,如果它不是A,那么您将不得不删除第一个字符。
#2
1
You can maybe try something like (AA)*(GG)*GTC
你可以试试(AA)*(GG)*GTC
I think that would do the trick. Don't know if there's a special syntax for mysql though
我想这样就可以了。不知道mysql是否有特殊的语法