如何在JavaScript Regex中将文本与带有/不带负向前瞻的表达式匹配

时间:2021-02-05 15:18:47

Supposed to have a comma separated string of text, where each text has or not - comma separated - a token in a list like

假设有一个逗号分隔的文本字符串,其中每个文本有或没有 - 逗号分隔 - 列表中的标记,如

var tokens=['Inc.','Ltd','LLC'];

so the string is like


var companies="Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure, LLC";

I want to obtain this array as output


var companiesList = [
    "Apple Inc.",
    "Microsoft Inc.",
    "Treasure LLC"

So I firstly did a RegExp like that


var regex=new RegExp("([a-zA-Z&/? ]*),\\s+("+token+")", "gi" )

that I get the matches and search for a regex like


var regex=new RegExp("([a-zA-Z&/? ]*),\\s+("+item+")", "i" )

for each of the tokens:


tokens.forEach((item) => {
    var regex = new RegExp("([a-zA-Z&/? ]*),\\s+(" + item + ")", "gi")
    var matches = companies.match(regex) || []
    console.log(item, regex.toString(), matches)
    matches.forEach((m) => {
        var regex = new RegExp("([a-zA-Z&/? ]*),\\s+(" + item + ")", "i")
        var match = m.match(regex)
        if (match && match.length > 2) {
            var n = match[1].trim();
            var c = match[2].trim();
            companiesList.push(n + ' ' + c);

In this way I can capture the tokens and concat matching groups 1 and 2.


var tokens = ['inc.', 'ltd', 'llc'],
  companies = "Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure, LLC",
  companiesList = [];
tokens.forEach((item) => {
  var regex = new RegExp("([a-zA-Z&/? ]*),\\s+(" + item + ")", "gi")
  var matches = companies.match(regex) || []
  console.log( item, regex.toString(), matches )
  matches.forEach((m) => {
    var regex = new RegExp("([a-zA-Z&/? ]*),\\s+(" + item + ")", "i")
    var match = m.match(regex)
    if (match && match.length > 2) {
      var n = match[1].trim();
      var c = match[2].trim();
      companiesList.push(n + ' ' + c);


The problem is that I'm missing the comma separated text without a token after the comma like: Buzzfeed.


The idea is to use a non capturing group in a negative look ahead ( see here about non capturing groups in regex match)



But in this way I have any match when in the input string the token is present:


"Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure LLC".match( /([a-zA-Z]*)^(?:(?!llc).)+$/gi )

while I want to match only the text that do not have it so I would like to get - like the opposite before:

虽然我想只匹配没有它的文本所以我想得到 - 就像之前相反:


So how to negate/modify the previous code to work in both cases to obtain at the end the composed array:


var companiesList = [
        "Apple Inc.",
        "Microsoft Inc.",
        "Treasure LLC"

2 个解决方案



Wouldn't it be a lot easier to just reduce it, and just check the token list as you go


var tokens    = ['Inc.','Ltd','LLC'];
var companies = "Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure, LLC";

var result    = companies.split(',').reduce( (a,b,i) => {
    return tokens.indexOf(b.trim()) === -1  ? a.push(b.trim()) : a[a.length-1] += b,a;
}, []);




You could use a regex for splitting.


var companies = "Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure, LLC";

console.log(companies.split(/,\s(?!Inc\.|Ltd|LLC)/i).map(s => s.replace(', ', ' ')));



Wouldn't it be a lot easier to just reduce it, and just check the token list as you go


var tokens    = ['Inc.','Ltd','LLC'];
var companies = "Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure, LLC";

var result    = companies.split(',').reduce( (a,b,i) => {
    return tokens.indexOf(b.trim()) === -1  ? a.push(b.trim()) : a[a.length-1] += b,a;
}, []);




You could use a regex for splitting.


var companies = "Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure, LLC";

console.log(companies.split(/,\s(?!Inc\.|Ltd|LLC)/i).map(s => s.replace(', ', ' ')));