字符串比较:单独比较Vs附加字符串比较

时间:2022-05-12 23:00:00

I have six string variables say str11, str12, str13, str21, str21 and str23.

我有六个字符串变量,如str11,str12,str13,str21,str21和str23。

I need to compare combination of these variables.

我需要比较这些变量的组合。

The combinations I have to check is str11 -- str12 -- str13 as one group and str21 -- str22 -- str23 as other group. I have to compare these two groups.

我必须检查的组合是str11 - str12 - str13作为一组,str21 - str22 - str23作为其他组。我必须比较这两组。

Now I'm in confusion which method should I use for comparison?

现在我很困惑我应该用哪种方法进行比较?

Can I append strings of same group and compare, which is only one comparison say ( str11 append str12 append str13 ) eqauls ( str21 append str22 append str23 )

我可以追加相同组的字符串并进行比较,这只是一个比较说(str11追加str12追加str13)eqauls(str21追加str22追加str23)

Or

Should I go for individual 3 comparisons?

我应该进行个人3次比较吗?

if( str11 equals str21 ) {

    if( str12 equals str22 ) {

        if( str13 equals str23 ) {

        }

    }

}

What is performance factor which costs me because of string length when I do string comparison? Lets us assume all strings are of same( approx ) length.

当我进行字符串比较时,由于字符串长度而导致我的性能因素是什么?让我们假设所有字符串都是相同的(大约)长度。

8 个解决方案

#1


I’d test individually.

我会单独测试。

Is “AB” “CD” “EF” equal to “ABC” “DE” “F”?

“AB”“CD”“EF”是否等于“ABC”“DE”“F”?

Me thinks not.

我认为不是。

P.S. If it is, then it’s a VERY special case, and if you decide to code it that way (as a concatenated comparison) then comment the hell out of it.

附:如果是,那么这是一个非常特殊的情况,并且如果你决定以这种方式编码(作为连接比较),那么评论它的地狱。

#2


Splitting the comparison into three if statements is definitely not necessary. You could also simply do an AND with your comparisons, eg

将比较分成三个if语句绝对没有必要。你也可以简单地用你的比较做一个AND,例如

if (  str11 equals str21
   && str12 equals str22
   && str13 equals str23) ...

#3


Your variable names indicate a major code smell. It sounds like instead of having six variables, you should instead have two arrays, each containing three strings. In other words, something like this initially would have been much better:

您的变量名称表示主要代码气味。听起来好像不是有六个变量,而应该有两个数组,每个数组包含三个字符串。换句话说,这样的事情最初会好得多:

String[][] strs = new String[2][3];
strs[0][0] = str11;
strs[0][1] = str12;
...

Chances are that depending where you obtained the six strings from, you will not need to do this manually immediately before the comparison, but can likely pass in your arguments in a format that is more friendly.

有可能取决于您从哪里获得六个字符串,您不需要在比较之前立即手动执行此操作,但可能会以更友好的格式传递您的参数。

If you do wish to do this by comparing arrays of the string objects, and you are using Java 1.5 or above, remember that you have access to the java.util.Arrays.equals() methods for array equality. Using library methods as much as possible is a great way to avoid extra effort reinventing the wheel, and possible implementation mistakes (both submitted implementations so far have bugs, for example).

如果您希望通过比较字符串对象的数组来实现此目的,并且您使用的是Java 1.5或更高版本,请记住您可以访问java.util.Arrays.equals()方法以获得数组相等性。尽可能使用库方法是避免额外工作重新发明*的好方法,以及可能的实现错误(例如,目前提交的实现都有错误)。

The exact route you take probably depends on the domain you are writing for - if your particular problem requires you to always compare 3-tuples, then writing the code to explictly compare groups of three strings would not be such a good idea, as it would probably be more immediately understandable than code that compared arrays of arbitrary length. (If you are going this route, then by all means us a single if() conditional with && instead of nested if blocks, as Adam Bellaire demonstrated).

您采取的确切路线可能取决于您所写的域 - 如果您的特定问题要求您始终比较3元组,那么编写代码以明确比较三个字符串的组将不是一个好主意,因为它会可能比比较任意长度的数组的代码更容易理解。 (如果你要走这条路线,那么无论如何我们用一个if()条件用&&而不是嵌套if块,如Adam Bellaire所示)。

In general though, you'll have a much more reuable block of code if you set it up to work with arrays of arbitrary length.

一般而言,如果将其设置为使用任意长度的数组,您将拥有更多可重用的代码块。

#4


Appending the strings together and comparing will not work. For instance, strings 1 and 2 could be empty and string 3 could contain "gorps", while string 4 contains "gorps" and 5 and 6 are empty. A comparison of the appended results would return true, though that would be a false positive. You would have to come up with a delimiter you guarantee would not be contained in any string to get this to work, and that could get messy.

将字符串附加在一起并进行比较将不起作用。例如,字符串1和2可以为空,字符串3可以包含“gorps”,而字符串4包含“gorps”,5和6为空。附加结果的比较将返回true,尽管这将是误报。你必须提出一个分隔符,你保证不会包含在任何字符串中以使其工作,这可能会变得混乱。

I would just do the comparison the way you are doing it. It's readable and straightforward.

我会按照你的方式进行比较。它简单易读。

#5


The iteration over over one large char[] is probably faster than iteration over n separate string of a total equal length. This is because data is very local and the CPU has an easy time to prefetch data.

在一个大的char []上的迭代可能比在相等长度的n个单独的字符串上的迭代更快。这是因为数据非常本地化,CPU很容易预取数据。

However, when you concatenate multiple strings in Java you will use StringBuilder/Buffer and then convert i back to a String in several cases. This will cause increased memory allocation due to how SB.append() works and Java String being immutable, which in turn can create a memory bottleneck and slow down your application significantly.

但是,当您在Java中连接多个字符串时,您将使用StringBuilder / Buffer,然后在几种情况下将i转换回String。由于SB.append()的工作方式和Java String是不可变的,这将导致内存分配增加,这反过来可能会造成内存瓶颈并显着降低应用程序的速度。

I would recommend keeping the Strings as is and do separate comparison. The gain in performance due to a longer char[] most likely is far less than the problems you can run in to with the higher allocation rate.

我建议保持字符串不变并进行单独比较。由于较长的char []而导致的性能提升最有可能远远低于您可以在更高的分配率下运行的问题。

#6


With all respect: I think your code and question not only smells a bit, but almost stinks (big smiley here).

尊重:我认为你的代码和问题不仅闻起来有点臭,而且几乎很臭(这里有大笑脸)。

1) the variable names indicate actually having string-vectors around; as already mentioned
2) the question of individual compares vs. a concatenated compare raises the question of how you define equality of your string-tuples; also already mentioned.

1)变量名称表示实际上有字符串向量;正如已经提到的那样2)个别比较与串联比较的问题提出了如何定义字符串元组的相等性的问题;也已经提到了。

But what strikes me most:

但最让我印象深刻的是:

3) To me that looks like a typical case of "premature optimization" and counting CPU cycles at the wrong place.

3)对我来说,这看起来像是一个典型的“过早优化”的情况,并在错误的地方计算CPU周期。

If you really care for the performance, forget about the cost of 3 individual compares against a single compare. Instead:

如果您真的关心性能,请忘记3次比较与单次比较的成本。代替:

How about the added overhead of creating two concatenated strings ?

如何创建两个连接字符串的额外开销?

  (str11 + str12 + str13) = (str21 + str22 + str23)

Lets analyze that w.r.t. to the memory manager and operations to be done. On the low level, that translates is 4 additional memory allocations, 2 additional strcpy's, and either another 4 additional strcat or strcpy (depending on how the VM does it; but most would use another strcpy) operations. Then a single compare is called for, which does not first count the characters using strlen; instead it either knows the size in advance (if the object header also includes the number of chars, which is likely) or it simply runs up to a 0-byte. That is called once vs. 3 times. The actual number of chars to compare is roughly the same (forget about the extra 0-bytes). That leaves us with 2 additional calls to strcmp (a few nS), vs. the overhead I described above (a few uS). If we add up the GC reclamation overhead (0 allocations vs. 4), I'd say that your "optimized" solution can easily be a 100 to 1000 times slower than the 3 strcmps !

让我们分析一下w.r.t.到内存管理器和要完成的操作。在低级别,转换为4个额外的内存分配,2个额外的strcpy,以及另外4个额外的strcat或strcpy(取决于VM如何执行;但大多数将使用另一个strcpy)操作。然后调用一个比较,它不首先使用strlen计算字符;相反,它要么事先知道大小(如果对象标题还包括可能的字符数),要么它只是运行到0字节。这被称为一次与3次。要比较的实际字符数大致相同(忘记额外的0字节)。这让我们额外调用了两次strcmp(几个nS),而不是我上面描述的开销(几个美国)。如果我们将GC回收开销加起来(0分配对4),我会说你的“优化”解决方案很容易比3个strcmps慢100到1000倍!

Additional Notice:
Theroretically, the JITter could optimize it or some of it, and actually generate code as suggested by Adam Bellaire, but I doubt that any JIT-developer cares to optimize such code. By the way, the system's string routines (aka String operations) are usually MUCH faster than handcoding, so do not start to loop over individual characters yourself.

附加说明:从理论上讲,JITter可以优化它或其中的一些,并实际生成Adam Bellire建议的代码,但我怀疑任何JIT开发人员都关心优化这些代码。顺便说一下,系统的字符串例程(也就是字符串操作)通常比手动编码快很多,因此不要自己开始遍历单个字符。

#7


I would add the two groups in two arrays, and then loop over the arrays to compare the individual strings in that array. A good example is already in the ansewers, given by Markus Lausberg.

我会在两个数组中添加两个组,然后遍历数组以比较该数组中的各个字符串。马库斯·劳斯伯格(Markus Lausberg)给出了一个很好的例子。

I would not be concerned about performance costs. Just write it in the most readable way possible. The Java compiler is very good in performance optimizations.

我不会担心性能成本。只需以最易读的方式编写它。 Java编译器在性能优化方面非常出色。

Example method:

    public boolean compareGroups(String[] group1, String[] group2){
    if (group1.length != group2.length ){
        return false;
    }

    for (int i = 0; i < group1.length; i++) {
        if (!group1[i].equals(group2[i])){
            return false;
        }
    }

    return true;
}

And calling the method is ofcourse simple:

调用方法很简单:

        String[] group1 = new String[]{"String 1", "String 2", "String 3"};
    String[] group2 = new String[]{"String 1", "String 2", "String 3"};

    boolean result = compareGroups(group1, group2);

#8


i would use the simple way

我会用简单的方法

dynamic run over all array elements of both arrays.

动态运行两个数组的所有数组元素。

            boolean isEqual = true;
            for(int n = 0;n<str1.length;++n){
                isEqual &= str1[n].equals(str2[n]);
            }

            return isEqual;

#1


I’d test individually.

我会单独测试。

Is “AB” “CD” “EF” equal to “ABC” “DE” “F”?

“AB”“CD”“EF”是否等于“ABC”“DE”“F”?

Me thinks not.

我认为不是。

P.S. If it is, then it’s a VERY special case, and if you decide to code it that way (as a concatenated comparison) then comment the hell out of it.

附:如果是,那么这是一个非常特殊的情况,并且如果你决定以这种方式编码(作为连接比较),那么评论它的地狱。

#2


Splitting the comparison into three if statements is definitely not necessary. You could also simply do an AND with your comparisons, eg

将比较分成三个if语句绝对没有必要。你也可以简单地用你的比较做一个AND,例如

if (  str11 equals str21
   && str12 equals str22
   && str13 equals str23) ...

#3


Your variable names indicate a major code smell. It sounds like instead of having six variables, you should instead have two arrays, each containing three strings. In other words, something like this initially would have been much better:

您的变量名称表示主要代码气味。听起来好像不是有六个变量,而应该有两个数组,每个数组包含三个字符串。换句话说,这样的事情最初会好得多:

String[][] strs = new String[2][3];
strs[0][0] = str11;
strs[0][1] = str12;
...

Chances are that depending where you obtained the six strings from, you will not need to do this manually immediately before the comparison, but can likely pass in your arguments in a format that is more friendly.

有可能取决于您从哪里获得六个字符串,您不需要在比较之前立即手动执行此操作,但可能会以更友好的格式传递您的参数。

If you do wish to do this by comparing arrays of the string objects, and you are using Java 1.5 or above, remember that you have access to the java.util.Arrays.equals() methods for array equality. Using library methods as much as possible is a great way to avoid extra effort reinventing the wheel, and possible implementation mistakes (both submitted implementations so far have bugs, for example).

如果您希望通过比较字符串对象的数组来实现此目的,并且您使用的是Java 1.5或更高版本,请记住您可以访问java.util.Arrays.equals()方法以获得数组相等性。尽可能使用库方法是避免额外工作重新发明*的好方法,以及可能的实现错误(例如,目前提交的实现都有错误)。

The exact route you take probably depends on the domain you are writing for - if your particular problem requires you to always compare 3-tuples, then writing the code to explictly compare groups of three strings would not be such a good idea, as it would probably be more immediately understandable than code that compared arrays of arbitrary length. (If you are going this route, then by all means us a single if() conditional with && instead of nested if blocks, as Adam Bellaire demonstrated).

您采取的确切路线可能取决于您所写的域 - 如果您的特定问题要求您始终比较3元组,那么编写代码以明确比较三个字符串的组将不是一个好主意,因为它会可能比比较任意长度的数组的代码更容易理解。 (如果你要走这条路线,那么无论如何我们用一个if()条件用&&而不是嵌套if块,如Adam Bellaire所示)。

In general though, you'll have a much more reuable block of code if you set it up to work with arrays of arbitrary length.

一般而言,如果将其设置为使用任意长度的数组,您将拥有更多可重用的代码块。

#4


Appending the strings together and comparing will not work. For instance, strings 1 and 2 could be empty and string 3 could contain "gorps", while string 4 contains "gorps" and 5 and 6 are empty. A comparison of the appended results would return true, though that would be a false positive. You would have to come up with a delimiter you guarantee would not be contained in any string to get this to work, and that could get messy.

将字符串附加在一起并进行比较将不起作用。例如,字符串1和2可以为空,字符串3可以包含“gorps”,而字符串4包含“gorps”,5和6为空。附加结果的比较将返回true,尽管这将是误报。你必须提出一个分隔符,你保证不会包含在任何字符串中以使其工作,这可能会变得混乱。

I would just do the comparison the way you are doing it. It's readable and straightforward.

我会按照你的方式进行比较。它简单易读。

#5


The iteration over over one large char[] is probably faster than iteration over n separate string of a total equal length. This is because data is very local and the CPU has an easy time to prefetch data.

在一个大的char []上的迭代可能比在相等长度的n个单独的字符串上的迭代更快。这是因为数据非常本地化,CPU很容易预取数据。

However, when you concatenate multiple strings in Java you will use StringBuilder/Buffer and then convert i back to a String in several cases. This will cause increased memory allocation due to how SB.append() works and Java String being immutable, which in turn can create a memory bottleneck and slow down your application significantly.

但是,当您在Java中连接多个字符串时,您将使用StringBuilder / Buffer,然后在几种情况下将i转换回String。由于SB.append()的工作方式和Java String是不可变的,这将导致内存分配增加,这反过来可能会造成内存瓶颈并显着降低应用程序的速度。

I would recommend keeping the Strings as is and do separate comparison. The gain in performance due to a longer char[] most likely is far less than the problems you can run in to with the higher allocation rate.

我建议保持字符串不变并进行单独比较。由于较长的char []而导致的性能提升最有可能远远低于您可以在更高的分配率下运行的问题。

#6


With all respect: I think your code and question not only smells a bit, but almost stinks (big smiley here).

尊重:我认为你的代码和问题不仅闻起来有点臭,而且几乎很臭(这里有大笑脸)。

1) the variable names indicate actually having string-vectors around; as already mentioned
2) the question of individual compares vs. a concatenated compare raises the question of how you define equality of your string-tuples; also already mentioned.

1)变量名称表示实际上有字符串向量;正如已经提到的那样2)个别比较与串联比较的问题提出了如何定义字符串元组的相等性的问题;也已经提到了。

But what strikes me most:

但最让我印象深刻的是:

3) To me that looks like a typical case of "premature optimization" and counting CPU cycles at the wrong place.

3)对我来说,这看起来像是一个典型的“过早优化”的情况,并在错误的地方计算CPU周期。

If you really care for the performance, forget about the cost of 3 individual compares against a single compare. Instead:

如果您真的关心性能,请忘记3次比较与单次比较的成本。代替:

How about the added overhead of creating two concatenated strings ?

如何创建两个连接字符串的额外开销?

  (str11 + str12 + str13) = (str21 + str22 + str23)

Lets analyze that w.r.t. to the memory manager and operations to be done. On the low level, that translates is 4 additional memory allocations, 2 additional strcpy's, and either another 4 additional strcat or strcpy (depending on how the VM does it; but most would use another strcpy) operations. Then a single compare is called for, which does not first count the characters using strlen; instead it either knows the size in advance (if the object header also includes the number of chars, which is likely) or it simply runs up to a 0-byte. That is called once vs. 3 times. The actual number of chars to compare is roughly the same (forget about the extra 0-bytes). That leaves us with 2 additional calls to strcmp (a few nS), vs. the overhead I described above (a few uS). If we add up the GC reclamation overhead (0 allocations vs. 4), I'd say that your "optimized" solution can easily be a 100 to 1000 times slower than the 3 strcmps !

让我们分析一下w.r.t.到内存管理器和要完成的操作。在低级别,转换为4个额外的内存分配,2个额外的strcpy,以及另外4个额外的strcat或strcpy(取决于VM如何执行;但大多数将使用另一个strcpy)操作。然后调用一个比较,它不首先使用strlen计算字符;相反,它要么事先知道大小(如果对象标题还包括可能的字符数),要么它只是运行到0字节。这被称为一次与3次。要比较的实际字符数大致相同(忘记额外的0字节)。这让我们额外调用了两次strcmp(几个nS),而不是我上面描述的开销(几个美国)。如果我们将GC回收开销加起来(0分配对4),我会说你的“优化”解决方案很容易比3个strcmps慢100到1000倍!

Additional Notice:
Theroretically, the JITter could optimize it or some of it, and actually generate code as suggested by Adam Bellaire, but I doubt that any JIT-developer cares to optimize such code. By the way, the system's string routines (aka String operations) are usually MUCH faster than handcoding, so do not start to loop over individual characters yourself.

附加说明:从理论上讲,JITter可以优化它或其中的一些,并实际生成Adam Bellire建议的代码,但我怀疑任何JIT开发人员都关心优化这些代码。顺便说一下,系统的字符串例程(也就是字符串操作)通常比手动编码快很多,因此不要自己开始遍历单个字符。

#7


I would add the two groups in two arrays, and then loop over the arrays to compare the individual strings in that array. A good example is already in the ansewers, given by Markus Lausberg.

我会在两个数组中添加两个组,然后遍历数组以比较该数组中的各个字符串。马库斯·劳斯伯格(Markus Lausberg)给出了一个很好的例子。

I would not be concerned about performance costs. Just write it in the most readable way possible. The Java compiler is very good in performance optimizations.

我不会担心性能成本。只需以最易读的方式编写它。 Java编译器在性能优化方面非常出色。

Example method:

    public boolean compareGroups(String[] group1, String[] group2){
    if (group1.length != group2.length ){
        return false;
    }

    for (int i = 0; i < group1.length; i++) {
        if (!group1[i].equals(group2[i])){
            return false;
        }
    }

    return true;
}

And calling the method is ofcourse simple:

调用方法很简单:

        String[] group1 = new String[]{"String 1", "String 2", "String 3"};
    String[] group2 = new String[]{"String 1", "String 2", "String 3"};

    boolean result = compareGroups(group1, group2);

#8


i would use the simple way

我会用简单的方法

dynamic run over all array elements of both arrays.

动态运行两个数组的所有数组元素。

            boolean isEqual = true;
            for(int n = 0;n<str1.length;++n){
                isEqual &= str1[n].equals(str2[n]);
            }

            return isEqual;