
时间:2021-10-09 12:54:40

Suppose I have two strings which may look like below


var tester = "hello I have to ask you a doubt";
var case   = "hello better explain me the doubt";

This case both strings contains common word such as hello and doubt. So lets say my default string is tester and I have a variable case and it holds set of words that can be any thing. And I do wanna achieve the common words count which present in both tester and case. And it should give me a result in the form of an object.




{"hello" : 1, "doubt" : 1};

My current implementation is like below


var tester = "hello I have to ask you a doubt";
function getMeRepeatedWordsDetails(case){
    var defaultWords = tester.split(" ");
    var testWords    = case.split(" "), result = {};
    for(var testWord in testWords){
        for(var defaultWord in defaultWords){
            if(defaultWord == testWord){
                result[testWord] = (!result[testWord]) ? 1 : (result[testWord] + 1);  
    return result;

As I suspect there are Regex to make this task easier since it can find the pattern matches. But not sure this can be achieved using Regex. I need to know did I'm following the right path to do the same.


1 个解决方案



You can use a first regular expression as a tokenizer to split the tester string into a list of words, then use such words to build a second regular expression that matches the word list. For example:


var tester = "a string with a lot of words";

function getMeRepeatedWordsDetails ( sentence ) {
  sentence = sentence + " ";
  var regex = /[^\s]+/g;
  var regex2 = new RegExp ( "(" + tester.match ( regex ).join ( "|" ) + ")\\W", "g" );
  matches = sentence.match ( regex2 );
  var words = {};
  for ( var i = 0; i < matches.length; i++ ) {
    var match = matches [ i ].replace ( /\W/g, "" );
    var w = words [ match ];
    if ( ! w )
      words [ match ] = 1;
      words [ match ]++;
  return words;

console.log ( getMeRepeatedWordsDetails ( "another string with some words" ) );

The tokenizer is the line:


var regex = /[^\s]+/g;

When you do:


tester.match ( regex )

you get the list of words contained in tester:


[ "a", "string", "with", "a", "lot", "of", "words" ]

With such an array we build a second regular expression that matches all the words; regex2 has the form:



The \W is added to match only whole words, otherwise the a element will match any word beginning with a. The result of applying regex2 to sentence is another array with only the words that are contained in regex2, that is the words that are contained both in tester and sentence. Then the for loop only counts the words in the matches array transforming it into the object you requested.

\ W添加到只匹配整个单词,否则一个元素将匹配任何词开头。regex2应用到句子的结果是另一个数组只有regex2中包含的词,这是包含在测试和句子的词。然后for循环只计算匹配数组中的单词,将其转换为您所请求的对象。

But beware that:


  • you have to put at least a space at the end of sentence otherwise the \W in regex2 doesn't match the last word: sentence = sentence + " "
  • 你必须在句尾至少留一个空格,否则regex2中的\W与最后一个词不匹配:句子=句子+ "
  • you have to remove some possible extra character form the matches that has been captured by the \W: match = matches [ i ].replace ( /\W/g, "" )
  • 您必须从由\W: match = matches [i]捕获的匹配中删除一些可能的额外字符。更换(/\W/g ")



You can use a first regular expression as a tokenizer to split the tester string into a list of words, then use such words to build a second regular expression that matches the word list. For example:


var tester = "a string with a lot of words";

function getMeRepeatedWordsDetails ( sentence ) {
  sentence = sentence + " ";
  var regex = /[^\s]+/g;
  var regex2 = new RegExp ( "(" + tester.match ( regex ).join ( "|" ) + ")\\W", "g" );
  matches = sentence.match ( regex2 );
  var words = {};
  for ( var i = 0; i < matches.length; i++ ) {
    var match = matches [ i ].replace ( /\W/g, "" );
    var w = words [ match ];
    if ( ! w )
      words [ match ] = 1;
      words [ match ]++;
  return words;

console.log ( getMeRepeatedWordsDetails ( "another string with some words" ) );

The tokenizer is the line:


var regex = /[^\s]+/g;

When you do:


tester.match ( regex )

you get the list of words contained in tester:


[ "a", "string", "with", "a", "lot", "of", "words" ]

With such an array we build a second regular expression that matches all the words; regex2 has the form:



The \W is added to match only whole words, otherwise the a element will match any word beginning with a. The result of applying regex2 to sentence is another array with only the words that are contained in regex2, that is the words that are contained both in tester and sentence. Then the for loop only counts the words in the matches array transforming it into the object you requested.

\ W添加到只匹配整个单词,否则一个元素将匹配任何词开头。regex2应用到句子的结果是另一个数组只有regex2中包含的词,这是包含在测试和句子的词。然后for循环只计算匹配数组中的单词,将其转换为您所请求的对象。

But beware that:


  • you have to put at least a space at the end of sentence otherwise the \W in regex2 doesn't match the last word: sentence = sentence + " "
  • 你必须在句尾至少留一个空格,否则regex2中的\W与最后一个词不匹配:句子=句子+ "
  • you have to remove some possible extra character form the matches that has been captured by the \W: match = matches [ i ].replace ( /\W/g, "" )
  • 您必须从由\W: match = matches [i]捕获的匹配中删除一些可能的额外字符。更换(/\W/g ")