为什么在删除Accents / Diacritics时不会将D扁平化为D.

I'm using this method to remove accents from my strings:


static string RemoveAccents(string input)
    string normalized = input.Normalize(NormalizationForm.FormKD);
    StringBuilder builder = new StringBuilder();
    foreach (char c in normalized)
        if (char.GetUnicodeCategory(c) !=
    return builder.ToString();

but this method leaves đ as đ and doesn't change it to d, even though d is its base char. you can try it with this input string "æøåáâăäĺćçčéęëěíîďđńňóôőöřůúűüýţ"


What's so special in letter đ?


The answer for why it doesn't work is that the statement that "d is its base char" is false. U+0111 (LATIN SMALL LETTER D WITH STROKE) has Unicode category "Letter, Lowercase" and has no decomposition mapping (i.e., it doesn't decompose to "d" followed by a combining mark).

为什么它不起作用的答案是“d是它的基本字符”的陈述是错误的。 U + 0111(LATIN SMALL LETTER D WITH STROKE)具有Unicode类别“Letter,Lowercase”并且没有分解映射(即,它不分解为“d”,后面跟着组合标记)。

"đ".Normalize(NormalizationForm.FormD) simply returns "đ", which is not stripped out by the loop because it is not a non-spacing mark.


A similar issue will exist for "ø" and other letters for which Unicode provides no decomposition mapping. (And if you're trying to find the "best" ASCII character to represent a Unicode letter, this approach won't work at all for Cyrillic, Greek, Chinese or other non-Latin alphabets; you'll also run into problems if you wanted to transliterate "ß" into "ss", for example. Using a library like UnidecodeSharp may help.)

“ø”和其他Unicode不提供分解映射的字母也存在类似的问题。 (如果你试图找到代表Unicode字母的“最佳”ASCII字符,这种方法对于西里尔字母,希腊文,中文或其他非拉丁字母表都不起作用;如果你发现问题,你也会遇到问题。例如,你想将“ß”音译成“ss”。使用像UnidecodeSharp这样的库可能会有帮助。)



I have to admit that I'm not sure why this works but it sure seems to


var str = "æøåáâăäĺćçčéęëěíîďđńňóôőöřůúűüýţ";
var noApostrophes = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(str)); 

=> "aoaaaaalccceeeeiiddnnooooruuuuyt"



"D with stroke" (Wikipedia) is used in several languages, and appears to be considered a distinct letter in all of them -- and that is why it remains unchanged.

“D with stroke”(*)以多种语言使用,并且在所有语言中看起来都被视为一个独特的字母 - 这就是它保持不变的原因。



this should work


    private static String RemoveDiacritics(string text)
        String normalized = text.Normalize(NormalizationForm.FormD);
        StringBuilder sb = new StringBuilder();

        for (int i = 0; i < normalized.Length; i++)
            Char c = normalized[i];
            if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)

        return sb.ToString();



