Javascript正则表达式:匹配如何共享相同的关键字?

时间:2021-02-19 21:00:19

In this jsfiddle I have a regex that extracts tables from SQL select statements. The regex is run on two SQL statements and they should return the same matches: ["table1 t1, table2 t2", "table3 t3"].

在这个jsfiddle中,我有一个从SQL select语句中提取表的正则表达式。正则表达式在两个SQL语句上运行,它们应该返回相同的匹配:[“table1 t1,table2 t2”,“table3 t3”]。

The first SQL works fine (see console.log) the second one is not working (doesn't detect table3 t3 apparently because the join keyword is both the last keyword in the previous match, and the first keyword of the second match.

第一个SQL工作正常(请参阅console.log)第二个不能正常工作(显然没有检测到table3 t3,因为join关键字既是上一个匹配中的最后一个关键字,又是第二个匹配的第一个关键字。

Is there a way to tell regex to "go back" when trying to match?

有没有办法告诉正则表达式在试图匹配时“回去”?

Javascript:

var sql = "select * from table1 t1, table2 t2 where t1.col in (select col from table3 t3)";

var myRegexp = /(:?from|join)\s(.*?)(:?\s+where|\s+on|\s+inner|\s+outer|\s+full|\s+left|\s+right|\s+join|\s*\))/gi;

console.log(getMatches(sql,myRegexp,2));

sql = "SELECT column FROM table1 t1, table2 t2 join table3 t3 on t1.column_name=t3.column_name";

console.log(getMatches(sql,myRegexp,2));

function getMatches(string, regex, index) {
    var matches = [];
    var match;
    while (match = regex.exec(string)) {
        matches.push(match[index]);
    }
    return matches;
}

3 个解决方案

#1


This will work if you don’t mind comma-separated tables being a single match:

如果你不介意逗号分隔的表是一个匹配,这将有效:

/(?:from|join)\s+(.*?)(?:\)|\s+(?=where|on|inner|outer|full|left|right|join))/gi

For example, it gives "table1 t1, table2 t2","table3 t3" for the example in your question (quotes added around matches). Live sample.

例如,它为您的问题中的示例提供了“table1 t1,table2 t2”,“table3 t3”(围绕匹配添加了引号)。现场样本。

PS. As @GUIDO says, (?: ) is the non-capturing group syntax, so you’ll need to change your function calls to getMatches(sql,myRegexp,1) w/ this regex.

PS。正如@GUIDO所说,(?:)是非捕获组语法,所以你需要将你的函数调用改为getMatches(sql,myRegexp,1)和这个正则表达式。

#2


Guess I must preface this with the obligitory "Is this such a good idea?" Typically Regex is not the answer for things like this (e.g. parsing).

猜猜我必须在这个问题上加上“这是一个好主意吗?”通常,Regex不是这类事情的答案(例如解析)。

That said, I think this is what you're looking for.

那就是说,我认为这就是你要找的东西。

There is what's known as positive and negative lookaheads which instructs the regex engine to ensure the match is followed by something without that being part of the match

有一种所谓的正面和负面的前瞻,指示正则表达式引擎确保匹配后跟一些东西没有成为匹配的一部分

Positive lookahead works just the same. q(?=u) matches a q that is followed by a u, without making the u part of the match. The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign.

积极前瞻的工作原理是一样的。 q(?= u)匹配一个后跟u的q,而不是匹配的u部分。积极的先行构造是一对括号,左括号后跟一个问号和一个等号。

There are also matching positive/negative lookbehind, but I don't believe JavaScript supports it.

还有匹配的正面/负面外观,但我不相信JavaScript支持它。

#3


This regex should give more accurate results:

这个正则表达式应该给出更准确的结果:

/(?:from|join|,)\s(.*?)(?=\s+where|\s+on|\s+inner|\s+outer|\s+full|\s+left|\s+right|\s+join|\s*,|\s*\))/

regex: https://regex101.com/r/lE0rD9/1 fiddle: http://jsfiddle.net/9m0cyLz2/

正则表达式:https://regex101.com/r/lE0rD9/1小提琴:http://jsfiddle.net/9m0cyLz2/

#1


This will work if you don’t mind comma-separated tables being a single match:

如果你不介意逗号分隔的表是一个匹配,这将有效:

/(?:from|join)\s+(.*?)(?:\)|\s+(?=where|on|inner|outer|full|left|right|join))/gi

For example, it gives "table1 t1, table2 t2","table3 t3" for the example in your question (quotes added around matches). Live sample.

例如,它为您的问题中的示例提供了“table1 t1,table2 t2”,“table3 t3”(围绕匹配添加了引号)。现场样本。

PS. As @GUIDO says, (?: ) is the non-capturing group syntax, so you’ll need to change your function calls to getMatches(sql,myRegexp,1) w/ this regex.

PS。正如@GUIDO所说,(?:)是非捕获组语法,所以你需要将你的函数调用改为getMatches(sql,myRegexp,1)和这个正则表达式。

#2


Guess I must preface this with the obligitory "Is this such a good idea?" Typically Regex is not the answer for things like this (e.g. parsing).

猜猜我必须在这个问题上加上“这是一个好主意吗?”通常,Regex不是这类事情的答案(例如解析)。

That said, I think this is what you're looking for.

那就是说,我认为这就是你要找的东西。

There is what's known as positive and negative lookaheads which instructs the regex engine to ensure the match is followed by something without that being part of the match

有一种所谓的正面和负面的前瞻,指示正则表达式引擎确保匹配后跟一些东西没有成为匹配的一部分

Positive lookahead works just the same. q(?=u) matches a q that is followed by a u, without making the u part of the match. The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign.

积极前瞻的工作原理是一样的。 q(?= u)匹配一个后跟u的q,而不是匹配的u部分。积极的先行构造是一对括号,左括号后跟一个问号和一个等号。

There are also matching positive/negative lookbehind, but I don't believe JavaScript supports it.

还有匹配的正面/负面外观,但我不相信JavaScript支持它。

#3


This regex should give more accurate results:

这个正则表达式应该给出更准确的结果:

/(?:from|join|,)\s(.*?)(?=\s+where|\s+on|\s+inner|\s+outer|\s+full|\s+left|\s+right|\s+join|\s*,|\s*\))/

regex: https://regex101.com/r/lE0rD9/1 fiddle: http://jsfiddle.net/9m0cyLz2/

正则表达式:https://regex101.com/r/lE0rD9/1小提琴:http://jsfiddle.net/9m0cyLz2/