localeCompare在使用前导变音字符排序单词时显示不一致的行为

Tested in latest Firefox and Chrome (which have a 'de' locale on my system):

在最新的Firefox和Chrome中测试过(在我的系统上有'de'语言环境):

"Ä".localeCompare("A")

gives me 1, meaning that it believes "Ä" should appear after "A" in a sorted order, which is correct.

给我1,意思是它认为“Ä”应该按照排序顺序出现在“A”之后,这是正确的。

But:

"Ägypten".localeCompare("Algerien")

gives me -1, meaning that it believes "Ägypten" should appear before "Algerien" in a sorted order.

给我-1,意思是它认为“Ägypten”应该按照排序顺序出现在“Algerien”之前。

Why? Why does it look past the first character of each string, if it says that the first character of the first string should appear after the first character of the second string when you check it on its own?

为什么?为什么它看起来超过每个字符串的第一个字符,如果它说第一个字符串的第一个字符应该出现在第二个字符串的第一个字符后,当你自己检查它时?

2 个解决方案

#1

Here you have method just for your needs, copy paste this method:

在这里你有方法只是为了你的需要,复制粘贴这个方法:

Recursive parse of strings and give char locale compare result not string :)

递归解析字符串并给出char语言环境比较结果而不是字符串:)

FINAL RESULT Bug Fixed, added compare (incorrect stoppage or recursive loop) to entire strings:

最终结果错误修复,添加比较(不正确的停止或递归循环)到整个字符串:

String.prototype.MylocaleCompare = function (right, idx){
    idx = (idx == undefined) ? 0 : idx++;

    var run = right.length <= this.length ? (idx < right.length - 1 ? true : false) : (idx < this.length - 1 ? true : false);


    if (!run) 
    {
        if (this[0].localeCompare(right[0]) == 0)
            {
                return this.localeCompare(right);
            }
            else
            {
                return this[0].localeCompare(right[0])
            }
    }

    if(this.localeCompare(right) != this[0].localeCompare(right[0]))
    {
        var myLeft = this.slice(1, this.length);
        var myRight = right.slice(1, right.length);
        if (myLeft.localeCompare(myRight) != myLeft[0].localeCompare(myRight[0]))
        {
            return myLeft.MylocaleCompare(myRight, idx);
        }
        else
        {
            if (this[0].localeCompare(right[0]) == 0)
            {
                return myLeft.MylocaleCompare(myRight, idx);
            }
            else
            {
                return this[0].localeCompare(right[0])
            }
        }
    }
    else
    {
        return this.localeCompare(right);
    }

}

#2

http://en.wikipedia.org/wiki/Diaeresis_(diacritic)#Printing_conventions_in_German

“When alphabetically sorting German words, the umlaut is usually not distinguished from the underlying vowel, although if two words differ only by an umlaut, the umlauted one comes second […]
“There is a second system in limited use, mostly for sorting names (colloquially called "telephone directory sorting"), which treats ü like ue, and so on.”

“当按字母顺序对德语单词进行排序时,变音符号通常与底层元音区别开来,尽管如果两个单词只有一个变音符号不同,那么这个单词就会变成第二个[...]”第二个系统的用途有限,主要用于排序名称(通俗地称为“电话簿排序”),它将ü视为ue,等等。“

Assuming the second kind of sorting algorithm is applied, then the results you are seeing make sense.

假设应用了第二种排序算法,那么您看到的结果是有意义的。

Ä would become Ae, and that is “longer” then your other value A, so sorting A before Ae and therefor A before Ä would be correct (and as you said yourself, you consider this to be correct; and even by the first algorithm that just treats Ä as A it would be correct, too).

Ä会变成Ae,并且比你的另一个值A“更长”,所以在Ae之前排序A,因此在Ä之前排序A是正确的(正如你自己说的那样,你认为这是正确的;甚至是第一个算法只是将Ä视为A也是正确的。

Now Ägypten becomes Aegypten for sorting purposes, and therefor it has to appear before Algerien in the same sorting logic – the first letters of both terms are equal, so it is up to the second ones to determine sort order, and e has a lexicographically lower sort value than l. Therefor, Aegypten before Algerien, meaning Ägypten before Algerien.

现在Ägypten成为Aegypten用于分类目的,因此它必须在Algerien之前出现在相同的排序逻辑中 - 两个术语的第一个字母是相等的,所以由第二个字母确定排序顺序,并且e在词典上更低排序值比l。因此,Aegypten在Algerien之前,意思是在Algerien之前的Ägypten。

German Wikipedia elaborates even more about this (http://de.wikipedia.org/wiki/Alphabetische_Sortierung#Einsortierungsregeln_f.C3.BCr_weitere_Buchstaben), and notes that there are two variants of the relevant DIN 5007.

德国*更详细地阐述了这一点(http://de.wikipedia.org/wiki/Alphabetische_Sortierung#Einsortierungsregeln_f.C3.BCr_weitere_Buchstaben),并指出相关DIN 5007有两种变体。

DIN 5007 variant 1 says, ä is to be treated as a, ö as o and ü as u, and that this kind of sorting was to be used for dictionaries and the like.

DIN 5007变体1表示,ä应被视为a,ö为o和ü为u,并且这种排序将用于字典等。

DIN 5007 variant 1 says the other thing, ä to be treated as ae, etc., and this is to be used mainly for name listings such as telephone books.

DIN 5007变体1说明了另一件事,ä被视为ae等,这主要用于电话簿等名单。

Wikipedia goes on to say that this takes into account that there might be more than one form of spelling for personal names (someone’s last name might be Moeller or Möller, both versions exist), whereas for words in a dictionary there is usually only one spelling that is considered correct.

*继续说,这考虑到个人姓名可能有多种形式的拼写(某人的姓可能是Moeller或Möller,两个版本都存在),而对于词典中的单词,通常只有一种拼写这被认为是正确的。

Now, I guess the price question remaining is: Can I get browsers to apply the other form of sorting for German locale? To be frank, I don’t know.

现在,我想剩下的价格问题是:我可以让浏览器为德语区域设置应用其他形式的排序吗?坦率地说,我不知道。

It might surely be desirable to be able to chose between those two forms of sorting, because as the Wikipedia says personal names Moeller and Möller exist, but there is only Ägypten and not Aegypten when it comes to a dictionary.

可能肯定希望能够在这两种形式的排序之间进行选择,因为*说Moeller和Möller的个人名字存在,但是当涉及字典时,只有Ägypten而不是Aegypten。

#1