使用正则表达式查找Javascript中两个字符串之间的差异

时间:2022-04-30 12:47:06

Regex experts please help to see if this problem can be solved by regex:

正则表达专家请帮助看看这个问题是否可以通过正则表达式解决:

Given string 1 is any string

给定字符串1是任何字符串

And string 2 is any string containing all parts of string 1 (but not a simple match -- I will give example)

字符串2是包含字符串1的所有部分的任何字符串(但不是简单匹配 - 我将举例)

How to use regex to replace all parts of string 1 in string 2 with blank so that what's remained is the string not in string 1?

如何使用正则表达式将字符串2中字符串1的所有部分替换为空白,以便剩下的字符串不在字符串1中?

For example: str1 = "test xyz"; str2 = "test ab xyz"

例如:str1 =“test xyz”; str2 =“test ab xyz”

I want " ab" or "ab " back. What is the regex I can write so that when I run a replace function on str2, it will return " ab"?

我想要“ab”或“ab”回来。我可以编写什么样的正则表达式,这样当我在str2上运行替换函数时,它将返回“ab”?

Here is some non-regex code:

这是一些非正则表达式代码:

            function findStringDiff(str1, str2) {
                var compareString = function(str1, str2) {
                    var a1 = str1.split("");
                    var a2 = str2.split("");
                    var idx2 = 0;
                    a1.forEach(function(val) {
                        if (a2[idx2] === val) {
                          a2.splice(idx2,1);
                        } else {
                            idx2 += 1;
                        }
                    });
                    if (idx2 > 0) {
                        a2.splice(idx2,a2.length);
                    }
                    return a2.join("");
                }

                if (str1.length < str2.length) {
                    return compareString(str1, str2);
                } else {
                    return compareString(str2, str1);
                }
            }

            console.log(findStringDiff("test xyz","test ab xyz"));

4 个解决方案

#1


11  

Regexes only recognize if a string matches a certain pattern. They're not flexible enough to do comparisons like you're asking for. You would have to take the first string and build a regular language based on it to recognize the second string, and then use match groups to grab the other parts of the second string and concatenate them together. Here's something that does what I think you want in a readable way.

正则表达式只识别字符串是否与某个模式匹配。他们没有足够的灵活性来进行比较,就像你要求的那样。您必须获取第一个字符串并基于它构建常规语言以识别第二个字符串,然后使用匹配组来获取第二个字符串的其他部分并将它们连接在一起。这是以可读的方式完成我认为你想要的东西。

//assuming "b" contains a subsequence containing 
//all of the letters in "a" in the same order
function getDifference(a, b)
{
    var i = 0;
    var j = 0;
    var result = "";

    while (j < b.length)
    {
        if (a[i] != b[j] || i == a.length)
            result += b[j];
        else
            i++;
        j++;
    }
    return result;
}

console.log(getDifference("test fly", "test xy flry"));

Here's a jsfiddle for it: http://jsfiddle.net/d4rcuxw9/1/

这是一个jsfiddle:http://jsfiddle.net/d4rcuxw9/1/

#2


1  

I find this question really interesting. Even though I'm a little late, I would like to share my solution on how to accomplish this with regex. The solution is concise but not very readable.

我发现这个问题非常有趣。即使我有点迟了,我想分享我的解决方案,如何使用正则表达式实现这一目标。解决方案简洁但不易读。

While I like it for its conciseness, I probably would not use it my code, because it's opacity reduces the maintainability.

虽然我喜欢它的简洁性,但我可能不会将它用于我的代码,因为它的不透明度降低了可维护性。

var str1 = "test xyz",
    str2 = "test ab xyz"
    replacement = '';
var regex = new RegExp(str1.split('').map(function(char){
    return char.replace(/[.(){}+*?[|\]\\^$]/, '\\$&');
}).join('(.*)'));
if(regex.test(str2)){
    for(i=1; i<str1.length; i++) replacement = replacement.concat('$' + i);
    var difference = str2.replace(regex, replacement);
} else {
    alert ('str2 does not contain str1');
}

The regular expression for "test xyz" is /t(.*)e(.*)s(.*)t(.*) (.*)x(.*)y(.*)z/ and replacement is "$1$2$3$4$5$6$7".

“test xyz”的正则表达式是/t(.*)e(.*)s(.*)t(。*)(。*)x(。*)y(。*)z /,替换是“ $ 1 $ 2 $ 3 $ $ 4 $ 5 $ 6 7" 。

The code is no longer concise, but it works now even if str1 contains special characters.

代码不再简洁,但即使str1包含特殊字符,它现在也可以使用。

#3


-2  

To find out if there are extra '.' like you are asking for, you can do this:

要弄清楚是否有额外的'。'就像你要求的那样,你可以这样做:

result = "$1...00".match(/\$1\.(\.*)?00/)[1];

result is then the EXTRA '.'s found. You cannot use regex to compare strings using only regex. Perhaps use this, then compare the results.

结果是EXTRA'。找到了。您不能使用正则表达式仅使用正则表达式来比较字符串。或许使用它,然后比较结果。

You can also try this:

你也可以试试这个:

result = "$1...00".match(/(\$)(\d+)\.(\.*)?(\d+)/);
// Outputs: ["$1...00", "$", "1", "..", "00"]

Which will extract the various parts to compare.

将提取各个部分进行比较。

#4


-2  

If you are only concerned with testing whether a given string contains two or more sequential dot '.' characters:

如果您只关心测试给定字符串是否包含两个或更多连续点'。'特点:

var string = '$1..00',
    regexp = /(\.\.+)/;

alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));

If you need it to match the currency format:

如果您需要它来匹配货币格式:

var string = '$1..00',
    regexp = /\$\d*(\.\.+)(?:\d\d)+/;

alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));

But I caution you that Regular Expressions aren't for comparing the differences between two strings; they are used for defining patterns to match against given strings.

但我告诫你,正则表达式不是用于比较两个字符串之间的差异;它们用于定义与给定字符串匹配的模式。

So, while this may directly answer how to find the "multiple dots" pattern, it is useless for "finding the difference between two strings".

因此,虽然这可以直接回答如何找到“多点”模式,但是“找到两个字符串之间的差异”是没用的。

The * tag wiki provides an excellent overview and basic reference for RegEx. See: https://*.com/tags/regex/info

*标记wiki为RegEx提供了出色的概述和基本参考。请参阅:https://*.com/tags/regex/info

#1


11  

Regexes only recognize if a string matches a certain pattern. They're not flexible enough to do comparisons like you're asking for. You would have to take the first string and build a regular language based on it to recognize the second string, and then use match groups to grab the other parts of the second string and concatenate them together. Here's something that does what I think you want in a readable way.

正则表达式只识别字符串是否与某个模式匹配。他们没有足够的灵活性来进行比较,就像你要求的那样。您必须获取第一个字符串并基于它构建常规语言以识别第二个字符串,然后使用匹配组来获取第二个字符串的其他部分并将它们连接在一起。这是以可读的方式完成我认为你想要的东西。

//assuming "b" contains a subsequence containing 
//all of the letters in "a" in the same order
function getDifference(a, b)
{
    var i = 0;
    var j = 0;
    var result = "";

    while (j < b.length)
    {
        if (a[i] != b[j] || i == a.length)
            result += b[j];
        else
            i++;
        j++;
    }
    return result;
}

console.log(getDifference("test fly", "test xy flry"));

Here's a jsfiddle for it: http://jsfiddle.net/d4rcuxw9/1/

这是一个jsfiddle:http://jsfiddle.net/d4rcuxw9/1/

#2


1  

I find this question really interesting. Even though I'm a little late, I would like to share my solution on how to accomplish this with regex. The solution is concise but not very readable.

我发现这个问题非常有趣。即使我有点迟了,我想分享我的解决方案,如何使用正则表达式实现这一目标。解决方案简洁但不易读。

While I like it for its conciseness, I probably would not use it my code, because it's opacity reduces the maintainability.

虽然我喜欢它的简洁性,但我可能不会将它用于我的代码,因为它的不透明度降低了可维护性。

var str1 = "test xyz",
    str2 = "test ab xyz"
    replacement = '';
var regex = new RegExp(str1.split('').map(function(char){
    return char.replace(/[.(){}+*?[|\]\\^$]/, '\\$&');
}).join('(.*)'));
if(regex.test(str2)){
    for(i=1; i<str1.length; i++) replacement = replacement.concat('$' + i);
    var difference = str2.replace(regex, replacement);
} else {
    alert ('str2 does not contain str1');
}

The regular expression for "test xyz" is /t(.*)e(.*)s(.*)t(.*) (.*)x(.*)y(.*)z/ and replacement is "$1$2$3$4$5$6$7".

“test xyz”的正则表达式是/t(.*)e(.*)s(.*)t(。*)(。*)x(。*)y(。*)z /,替换是“ $ 1 $ 2 $ 3 $ $ 4 $ 5 $ 6 7" 。

The code is no longer concise, but it works now even if str1 contains special characters.

代码不再简洁,但即使str1包含特殊字符,它现在也可以使用。

#3


-2  

To find out if there are extra '.' like you are asking for, you can do this:

要弄清楚是否有额外的'。'就像你要求的那样,你可以这样做:

result = "$1...00".match(/\$1\.(\.*)?00/)[1];

result is then the EXTRA '.'s found. You cannot use regex to compare strings using only regex. Perhaps use this, then compare the results.

结果是EXTRA'。找到了。您不能使用正则表达式仅使用正则表达式来比较字符串。或许使用它,然后比较结果。

You can also try this:

你也可以试试这个:

result = "$1...00".match(/(\$)(\d+)\.(\.*)?(\d+)/);
// Outputs: ["$1...00", "$", "1", "..", "00"]

Which will extract the various parts to compare.

将提取各个部分进行比较。

#4


-2  

If you are only concerned with testing whether a given string contains two or more sequential dot '.' characters:

如果您只关心测试给定字符串是否包含两个或更多连续点'。'特点:

var string = '$1..00',
    regexp = /(\.\.+)/;

alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));

If you need it to match the currency format:

如果您需要它来匹配货币格式:

var string = '$1..00',
    regexp = /\$\d*(\.\.+)(?:\d\d)+/;

alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));

But I caution you that Regular Expressions aren't for comparing the differences between two strings; they are used for defining patterns to match against given strings.

但我告诫你,正则表达式不是用于比较两个字符串之间的差异;它们用于定义与给定字符串匹配的模式。

So, while this may directly answer how to find the "multiple dots" pattern, it is useless for "finding the difference between two strings".

因此,虽然这可以直接回答如何找到“多点”模式,但是“找到两个字符串之间的差异”是没用的。

The * tag wiki provides an excellent overview and basic reference for RegEx. See: https://*.com/tags/regex/info

*标记wiki为RegEx提供了出色的概述和基本参考。请参阅:https://*.com/tags/regex/info