如何将西里尔字母翻译成拉丁文?

时间:2022-06-16 09:51:41

I have a method which turns any Latin text (e.g. English, French, German, Polish) into its slug form,

我有一种方法,可以把任何拉丁文(如英文、法文、德文、波兰文)变成蛞蝓形,

e.g. Alpha Bravo Charlie => alpha-bravo-charlie

b队查理=> a-b -查理

But it can't work for Cyrillic text (e.g. Russian), so what I'm wanting to do is transliterate the Cyrillic text to Latin characters, then slugify that.

但它不能适用于西里尔文字(例如俄语),所以我想要做的是把西里尔文字转换成拉丁字符,然后把它弄成糊状。

Does anyone have a way to do such transliteration? Whether by actual source or a library.

有人有办法做这种音译吗?无论是通过实际的来源还是图书馆。

I'm coding in C#, so a .NET library will work. Alternatively, if you have non-C# code, I'm sure I could convert it.

我用c#编码,所以。net库可以工作。或者,如果您有非c#代码,我确信我可以转换它。

9 个解决方案

#1


16  

You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin.

您可以使用。net开放源码dll库UnidecodeSharpFork将Cyrillic和更多语言转换为拉丁文。

Example usage:

使用示例:

Assert.AreEqual("Rabota s kirillitsey", "Работа с кириллицей".Unidecode());
Assert.AreEqual("CZSczs", "ČŽŠčžš".Unidecode());
Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode());

Testing Cyrillic:

测试斯拉夫字母:

/// <summary>
/// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN.
/// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian
/// With converting "ё" to "yo".
/// </summary>
[TestMethod]
public void RussianAlphabetTest()
{
    string russianAlphabetLowercase = "а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я";
    string russianAlphabetUppercase = "А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я";

    string expectedLowercase = "a b v g d e yo zh z i y k l m n o p r s t u f kh ts ch sh shch \" y ' e yu ya";
    string expectedUppercase = "A B V G D E Yo Zh Z I Y K L M N O P R S T U F Kh Ts Ch Sh Shch \" Y ' E Yu Ya";

    Assert.AreEqual(expectedLowercase, russianAlphabetLowercase.Unidecode());
    Assert.AreEqual(expectedUppercase, russianAlphabetUppercase.Unidecode());
}

Simple, fast and powerful. And it's easy to extend/modify transliteration table if you want to.

简单,快速和强大。而且如果你想扩展/修改音译表也很容易。

#2


12  

    public static string Translit(string str)
    {
        string[] lat_up = {"A", "B", "V", "G", "D", "E", "Yo", "Zh", "Z", "I", "Y", "K", "L", "M", "N", "O", "P", "R", "S", "T", "U", "F", "Kh", "Ts", "Ch", "Sh", "Shch", "\"", "Y", "'", "E", "Yu", "Ya"};
        string[] lat_low = {"a", "b", "v", "g", "d", "e", "yo", "zh", "z", "i", "y", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "f", "kh", "ts", "ch", "sh", "shch", "\"", "y", "'", "e", "yu", "ya"};
        string[] rus_up = {"А", "Б", "В", "Г", "Д", "Е", "Ё", "Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "П", "Р", "С", "Т", "У", "Ф", "Х", "Ц", "Ч", "Ш", "Щ", "Ъ", "Ы", "Ь", "Э", "Ю", "Я"};
        string[] rus_low = { "а", "б", "в", "г", "д", "е", "ё", "ж", "з", "и", "й", "к", "л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ч", "ш", "щ", "ъ", "ы", "ь", "э", "ю", "я"};
        for (int i = 0; i <= 32; i++)
        {
            str = str.Replace(rus_up[i],lat_up[i]);
            str = str.Replace(rus_low[i],lat_low[i]);              
        }
        return str;
    }

#3


8  

Why can't you just take a transliteration table and make a small regex or subroutine?

为什么不能取一个音译表,然后做一个小的regex或子例程呢?

#4


4  

Microsoft has a transliteration tool which includes a DLL you could hook into (you would need to check licensing restrictions if you're going to use it non-personally). You can read more about it in Dejan Vesić's blog post

微软有一个音译工具,其中包含一个你可以连接到的DLL(如果你打算不亲自使用的话,你需要检查许可限制)。你可以阅读更多关于德扬Vesić的博客文章

#5


4  

For future readers

为未来的读者

Windows 7+ can do this with its Extended Linguistic Services. (You'll need the Windows API Code Pack to do it from .NET)

Windows 7+可以通过扩展语言服务来实现这一点。(需要Windows API代码包从。net完成)

#6


4  

You can use my library for transliteration: https://github.com/nick-buhro/Translit
It is also available on NuGet.

您可以使用我的库进行音译:https://github.com/nick-buhro/Translit它也可以在NuGet上使用。

Example:

例子:

var latin = Transliteration.CyrillicToLatin(
    "Предками данная мудрость народная!", 
    Language.Russian);

Console.WriteLine(latin);   
// Output: Predkami dannaya mudrost` narodnaya!

#7


3  

Check this code:

检查这段代码:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

namespace Transliter
{
    public partial class Form1 : Form
    {
        Dictionary<string, string> words = new Dictionary<string, string>();

        public Form1()
        {
            InitializeComponent();
            words.Add("а", "a");
            words.Add("б", "b");
            words.Add("в", "v");
            words.Add("г", "g");
            words.Add("д", "d");
            words.Add("е", "e");
            words.Add("ё", "yo");
            words.Add("ж", "zh");
            words.Add("з", "z");
            words.Add("и", "i");
            words.Add("й", "j");
            words.Add("к", "k");
            words.Add("л", "l");
            words.Add("м", "m");
            words.Add("н", "n");
            words.Add("о", "o");
            words.Add("п", "p");
            words.Add("р", "r");
            words.Add("с", "s");
            words.Add("т", "t");
            words.Add("у", "u");
            words.Add("ф", "f");
            words.Add("х", "h");
            words.Add("ц", "c");
            words.Add("ч", "ch");
            words.Add("ш", "sh");
            words.Add("щ", "sch");
            words.Add("ъ", "j");
            words.Add("ы", "i");
            words.Add("ь", "j");
            words.Add("э", "e");
            words.Add("ю", "yu");
            words.Add("я", "ya");
            words.Add("А", "A");
            words.Add("Б", "B");
            words.Add("В", "V");
            words.Add("Г", "G");
            words.Add("Д", "D");
            words.Add("Е", "E");
            words.Add("Ё", "Yo");
            words.Add("Ж", "Zh");
            words.Add("З", "Z");
            words.Add("И", "I");
            words.Add("Й", "J");
            words.Add("К", "K");
            words.Add("Л", "L");
            words.Add("М", "M");
            words.Add("Н", "N");
            words.Add("О", "O");
            words.Add("П", "P");
            words.Add("Р", "R");
            words.Add("С", "S");
            words.Add("Т", "T");
            words.Add("У", "U");
            words.Add("Ф", "F");
            words.Add("Х", "H");
            words.Add("Ц", "C");
            words.Add("Ч", "Ch");
            words.Add("Ш", "Sh");
            words.Add("Щ", "Sch");
            words.Add("Ъ", "J");
            words.Add("Ы", "I");
            words.Add("Ь", "J");
            words.Add("Э", "E");
            words.Add("Ю", "Yu");
            words.Add("Я", "Ya");
    }

        private void button1_Click(object sender, EventArgs e)
        {
            string source = textBox1.Text;
            foreach (KeyValuePair<string, string> pair in words)
            {
                source = source.Replace(pair.Key, pair.Value);
            }
            textBox2.Text = source;
        }
    }
}

cryllic to latin:

cryllic拉丁:

text.Replace(pair.Key, pair.Value); 

latin to cryllic

拉丁,cryllic

source.Replace(pair.Value,pair.Key);

#8


0  

Here is a great article that describes how to make a C# equivalent of this JavaScript one.

这里有一篇很棒的文章,描述了如何使c#与这个JavaScript等价。

string result = DisplayInEnglish("Олъга Виктровна Василенко");

#9


-1  

Use this method Just pass your Cyrillic word contain string and this method return Latin English string corresponding to Cyrillic string.

使用此方法只传递包含Cyrillic字的字符串,此方法返回对应于Cyrillic字串的拉丁英语字符串。

public static string GetLatinCodeFromCyrillic(string str)
    {

        str = str.Replace("б", "b");
        str = str.Replace("Б", "B");

        str = str.Replace("в", "v");
        str = str.Replace("В", "V");

        str = str.Replace("г", "h");
        str = str.Replace("Г", "H");

        str = str.Replace("ґ", "g");
        str = str.Replace("Ґ", "G");

        str = str.Replace("д", "d");
        str = str.Replace("Д", "D");

        str = str.Replace("є", "ye");
        str = str.Replace("Э", "Ye");

        str = str.Replace("ж", "zh");
        str = str.Replace("Ж", "Zh");

        str = str.Replace("з", "z");
        str = str.Replace("З", "Z");

        str = str.Replace("и", "y");
        str = str.Replace("И", "Y");

        str = str.Replace("ї", "yi");
        str = str.Replace("Ї", "YI");

        str = str.Replace("й", "j");
        str = str.Replace("Й", "J");

        str = str.Replace("к", "k");
        str = str.Replace("К", "K");

        str = str.Replace("л", "l");
        str = str.Replace("Л", "L");

        str = str.Replace("м", "m");
        str = str.Replace("М", "M");

        str = str.Replace("н", "n");
        str = str.Replace("Н", "N");

        str = str.Replace("п", "p");
        str = str.Replace("П", "P");

        str = str.Replace("р", "r");
        str = str.Replace("Р", "R");

        str = str.Replace("с", "s");
        str = str.Replace("С", "S");

        str = str.Replace("ч", "ch");
        str = str.Replace("Ч", "CH");

        str = str.Replace("ш", "sh");
        str = str.Replace("Щ", "SHH");

        str = str.Replace("ю", "yu");
        str = str.Replace("Ю", "YU");

        str = str.Replace("Я", "YA");
        str = str.Replace("я", "ya");

        str = str.Replace('ь', '"');
        str = str.Replace("Ь", "");

        str = str.Replace('т', 't');
        str = str.Replace("Т", "T");

        str = str.Replace('ц', 'c');
        str = str.Replace("Ц", "C");

        str = str.Replace('о', 'o');
        str = str.Replace("О", "O");

        str = str.Replace('е', 'e');
        str = str.Replace("Е", "E");

        str = str.Replace('а', 'a');
        str = str.Replace("А", "A");

        str = str.Replace('ф', 'f');
        str = str.Replace("Ф", "F");

        str = str.Replace('і', 'i');
        str = str.Replace("І", "I");

        str = str.Replace('У', 'U');
        str = str.Replace("у", "u");

        str = str.Replace('х', 'x');
        str = str.Replace("Х", "X");
        return str;
    }

#1


16  

You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin.

您可以使用。net开放源码dll库UnidecodeSharpFork将Cyrillic和更多语言转换为拉丁文。

Example usage:

使用示例:

Assert.AreEqual("Rabota s kirillitsey", "Работа с кириллицей".Unidecode());
Assert.AreEqual("CZSczs", "ČŽŠčžš".Unidecode());
Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode());

Testing Cyrillic:

测试斯拉夫字母:

/// <summary>
/// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN.
/// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian
/// With converting "ё" to "yo".
/// </summary>
[TestMethod]
public void RussianAlphabetTest()
{
    string russianAlphabetLowercase = "а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я";
    string russianAlphabetUppercase = "А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я";

    string expectedLowercase = "a b v g d e yo zh z i y k l m n o p r s t u f kh ts ch sh shch \" y ' e yu ya";
    string expectedUppercase = "A B V G D E Yo Zh Z I Y K L M N O P R S T U F Kh Ts Ch Sh Shch \" Y ' E Yu Ya";

    Assert.AreEqual(expectedLowercase, russianAlphabetLowercase.Unidecode());
    Assert.AreEqual(expectedUppercase, russianAlphabetUppercase.Unidecode());
}

Simple, fast and powerful. And it's easy to extend/modify transliteration table if you want to.

简单,快速和强大。而且如果你想扩展/修改音译表也很容易。

#2


12  

    public static string Translit(string str)
    {
        string[] lat_up = {"A", "B", "V", "G", "D", "E", "Yo", "Zh", "Z", "I", "Y", "K", "L", "M", "N", "O", "P", "R", "S", "T", "U", "F", "Kh", "Ts", "Ch", "Sh", "Shch", "\"", "Y", "'", "E", "Yu", "Ya"};
        string[] lat_low = {"a", "b", "v", "g", "d", "e", "yo", "zh", "z", "i", "y", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "f", "kh", "ts", "ch", "sh", "shch", "\"", "y", "'", "e", "yu", "ya"};
        string[] rus_up = {"А", "Б", "В", "Г", "Д", "Е", "Ё", "Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "П", "Р", "С", "Т", "У", "Ф", "Х", "Ц", "Ч", "Ш", "Щ", "Ъ", "Ы", "Ь", "Э", "Ю", "Я"};
        string[] rus_low = { "а", "б", "в", "г", "д", "е", "ё", "ж", "з", "и", "й", "к", "л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ч", "ш", "щ", "ъ", "ы", "ь", "э", "ю", "я"};
        for (int i = 0; i <= 32; i++)
        {
            str = str.Replace(rus_up[i],lat_up[i]);
            str = str.Replace(rus_low[i],lat_low[i]);              
        }
        return str;
    }

#3


8  

Why can't you just take a transliteration table and make a small regex or subroutine?

为什么不能取一个音译表,然后做一个小的regex或子例程呢?

#4


4  

Microsoft has a transliteration tool which includes a DLL you could hook into (you would need to check licensing restrictions if you're going to use it non-personally). You can read more about it in Dejan Vesić's blog post

微软有一个音译工具,其中包含一个你可以连接到的DLL(如果你打算不亲自使用的话,你需要检查许可限制)。你可以阅读更多关于德扬Vesić的博客文章

#5


4  

For future readers

为未来的读者

Windows 7+ can do this with its Extended Linguistic Services. (You'll need the Windows API Code Pack to do it from .NET)

Windows 7+可以通过扩展语言服务来实现这一点。(需要Windows API代码包从。net完成)

#6


4  

You can use my library for transliteration: https://github.com/nick-buhro/Translit
It is also available on NuGet.

您可以使用我的库进行音译:https://github.com/nick-buhro/Translit它也可以在NuGet上使用。

Example:

例子:

var latin = Transliteration.CyrillicToLatin(
    "Предками данная мудрость народная!", 
    Language.Russian);

Console.WriteLine(latin);   
// Output: Predkami dannaya mudrost` narodnaya!

#7


3  

Check this code:

检查这段代码:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

namespace Transliter
{
    public partial class Form1 : Form
    {
        Dictionary<string, string> words = new Dictionary<string, string>();

        public Form1()
        {
            InitializeComponent();
            words.Add("а", "a");
            words.Add("б", "b");
            words.Add("в", "v");
            words.Add("г", "g");
            words.Add("д", "d");
            words.Add("е", "e");
            words.Add("ё", "yo");
            words.Add("ж", "zh");
            words.Add("з", "z");
            words.Add("и", "i");
            words.Add("й", "j");
            words.Add("к", "k");
            words.Add("л", "l");
            words.Add("м", "m");
            words.Add("н", "n");
            words.Add("о", "o");
            words.Add("п", "p");
            words.Add("р", "r");
            words.Add("с", "s");
            words.Add("т", "t");
            words.Add("у", "u");
            words.Add("ф", "f");
            words.Add("х", "h");
            words.Add("ц", "c");
            words.Add("ч", "ch");
            words.Add("ш", "sh");
            words.Add("щ", "sch");
            words.Add("ъ", "j");
            words.Add("ы", "i");
            words.Add("ь", "j");
            words.Add("э", "e");
            words.Add("ю", "yu");
            words.Add("я", "ya");
            words.Add("А", "A");
            words.Add("Б", "B");
            words.Add("В", "V");
            words.Add("Г", "G");
            words.Add("Д", "D");
            words.Add("Е", "E");
            words.Add("Ё", "Yo");
            words.Add("Ж", "Zh");
            words.Add("З", "Z");
            words.Add("И", "I");
            words.Add("Й", "J");
            words.Add("К", "K");
            words.Add("Л", "L");
            words.Add("М", "M");
            words.Add("Н", "N");
            words.Add("О", "O");
            words.Add("П", "P");
            words.Add("Р", "R");
            words.Add("С", "S");
            words.Add("Т", "T");
            words.Add("У", "U");
            words.Add("Ф", "F");
            words.Add("Х", "H");
            words.Add("Ц", "C");
            words.Add("Ч", "Ch");
            words.Add("Ш", "Sh");
            words.Add("Щ", "Sch");
            words.Add("Ъ", "J");
            words.Add("Ы", "I");
            words.Add("Ь", "J");
            words.Add("Э", "E");
            words.Add("Ю", "Yu");
            words.Add("Я", "Ya");
    }

        private void button1_Click(object sender, EventArgs e)
        {
            string source = textBox1.Text;
            foreach (KeyValuePair<string, string> pair in words)
            {
                source = source.Replace(pair.Key, pair.Value);
            }
            textBox2.Text = source;
        }
    }
}

cryllic to latin:

cryllic拉丁:

text.Replace(pair.Key, pair.Value); 

latin to cryllic

拉丁,cryllic

source.Replace(pair.Value,pair.Key);

#8


0  

Here is a great article that describes how to make a C# equivalent of this JavaScript one.

这里有一篇很棒的文章,描述了如何使c#与这个JavaScript等价。

string result = DisplayInEnglish("Олъга Виктровна Василенко");

#9


-1  

Use this method Just pass your Cyrillic word contain string and this method return Latin English string corresponding to Cyrillic string.

使用此方法只传递包含Cyrillic字的字符串,此方法返回对应于Cyrillic字串的拉丁英语字符串。

public static string GetLatinCodeFromCyrillic(string str)
    {

        str = str.Replace("б", "b");
        str = str.Replace("Б", "B");

        str = str.Replace("в", "v");
        str = str.Replace("В", "V");

        str = str.Replace("г", "h");
        str = str.Replace("Г", "H");

        str = str.Replace("ґ", "g");
        str = str.Replace("Ґ", "G");

        str = str.Replace("д", "d");
        str = str.Replace("Д", "D");

        str = str.Replace("є", "ye");
        str = str.Replace("Э", "Ye");

        str = str.Replace("ж", "zh");
        str = str.Replace("Ж", "Zh");

        str = str.Replace("з", "z");
        str = str.Replace("З", "Z");

        str = str.Replace("и", "y");
        str = str.Replace("И", "Y");

        str = str.Replace("ї", "yi");
        str = str.Replace("Ї", "YI");

        str = str.Replace("й", "j");
        str = str.Replace("Й", "J");

        str = str.Replace("к", "k");
        str = str.Replace("К", "K");

        str = str.Replace("л", "l");
        str = str.Replace("Л", "L");

        str = str.Replace("м", "m");
        str = str.Replace("М", "M");

        str = str.Replace("н", "n");
        str = str.Replace("Н", "N");

        str = str.Replace("п", "p");
        str = str.Replace("П", "P");

        str = str.Replace("р", "r");
        str = str.Replace("Р", "R");

        str = str.Replace("с", "s");
        str = str.Replace("С", "S");

        str = str.Replace("ч", "ch");
        str = str.Replace("Ч", "CH");

        str = str.Replace("ш", "sh");
        str = str.Replace("Щ", "SHH");

        str = str.Replace("ю", "yu");
        str = str.Replace("Ю", "YU");

        str = str.Replace("Я", "YA");
        str = str.Replace("я", "ya");

        str = str.Replace('ь', '"');
        str = str.Replace("Ь", "");

        str = str.Replace('т', 't');
        str = str.Replace("Т", "T");

        str = str.Replace('ц', 'c');
        str = str.Replace("Ц", "C");

        str = str.Replace('о', 'o');
        str = str.Replace("О", "O");

        str = str.Replace('е', 'e');
        str = str.Replace("Е", "E");

        str = str.Replace('а', 'a');
        str = str.Replace("А", "A");

        str = str.Replace('ф', 'f');
        str = str.Replace("Ф", "F");

        str = str.Replace('і', 'i');
        str = str.Replace("І", "I");

        str = str.Replace('У', 'U');
        str = str.Replace("у", "u");

        str = str.Replace('х', 'x');
        str = str.Replace("Х", "X");
        return str;
    }