在PHP应用程序中实现国际化(语言字符串)

时间:2021-08-21 12:42:34

I want to build a CMS that can handle fetching locale strings to support internationalization. I plan on storing the strings in a database, and then placing a key/value cache like memcache in between the database and the application to prevent performance drops for hitting the database each page for a translation.

我想构建一个可以处理获取区域设置字符串以支持国际化的CMS。我计划将字符串存储在数据库中,然后在数据库和应用程序之间放置一个键/值缓存(如memcache),以防止性能下降,从而使每个页面的数据库都能进行翻译。

This is more complex than using PHP files with arrays of strings - but that method is incredibly inefficient when you have 2,000 translation lines.

这比使用带有字符串数组的PHP文件更复杂 - 但是当您有2,000个翻译行时,这种方法效率非常低。

I thought about using gettext, but I'm not sure that users of the CMS will be comfortable working with the gettext files. If the strings are stored in a database, then a nice administration system can be setup to allow them to make changes whenever they want and the caching in RAM will insure that the fetching of those strings is as fast, or faster than gettext. I also don't feel safe using the PHP extension considering not even the zend framework uses it.

我想过使用gettext,但我不确定CMS的用户是否会习惯使用gettext文件。如果字符串存储在数据库中,那么可以设置一个不错的管理系统,允许它们随时进行更改,RAM中的缓存将确保获取这些字符串的速度比gettext快或快。考虑到甚至zend框架都没有使用它,我也觉得使用PHP扩展并不安全。

Is there anything wrong with this approach?

这种方法有什么问题吗?

Update

I thought perhaps I would add more food for thought. One of the problems with string translations it is that they doesn't support dates, money, or conditional statements. However, thanks to intl PHP now has MessageFormatter which is what really needs to be used anyway.

我想也许我会增加更多的思考。字符串翻译的一个问题是它们不支持日期,金钱或条件语句。但是,感谢intl PHP现在有了MessageFormatter,无论如何都需要使用它。

// Load string from gettext file
$string = _("{0} resulted in {1,choice,0#no errors|1#single error|1<{1, number} errors}");

// Format using the current locale
msgfmt_format_message(setlocale(LC_ALL, 0), $string, array('Update', 3));

On another note, one of the things I don't like about gettext is that the text is embedded into the application all over the place. That means that the team responsible for the primary translation (usually English) has to have access to the project source code to make changes in all the places the default statements are placed. It's almost as bad as applications that have SQL spaghetti-code all over.

另一方面,我不喜欢gettext的一个原因是文本被嵌入到整个应用程序中。这意味着负责主要翻译的团队(通常是英语)必须能够访问项目源代码,以便在默认语句的所有位置进行更改。它几乎与遍布SQL意大利面条代码的应用程序一样糟糕。

So, it makes sense to use keys like _('error.404_not_found') which then allow the content writers and translators to just worry about the PO/MO files without messing in the code.

因此,使用像_('error.404_not_found')这样的键是有意义的,这样就可以让内容编写者和翻译者只是担心PO / MO文件而不会弄乱代码。

However, in the event that a gettext translation doesn't exist for the given key then there is no way to fall back to a default (like you could with a custom handler). This means that you either have the writter mucking around in your code - or have "error.404_not_found" shown to users that don't have a locale translation!

但是,如果给定键不存在gettext转换,则无法回退到默认值(就像使用自定义处理程序一样)。这意味着您要么让代码在您的代码中乱窜 - 或者向没有语言环境翻译的用户显示“error.404_not_found”!

In addition, I am not aware of any large projects which use PHP's gettext. I would appreciate any links to well-used (and therefore tested), systems which actually rely on the native PHP gettext extension.

另外,我不知道有任何大型项目使用PHP的gettext。我很感激任何链接到使用良好(因此经过测试)的系统,这些系统实际上依赖于本机PHP gettext扩展。

10 个解决方案

#1


6  

Gettext uses a binary protocol that is quite quick. Also the gettext implementation is usually simpler as it only requires echo _('Text to translate');. It also has existing tools for translators to use and they're proven to work well.

Gettext使用非常快速的二进制协议。此外,gettext实现通常更简单,因为它只需要echo _('Text to translate');.它还有现有的翻译工具,并且它们被证明可以很好地运行。

You can store them in a database but I feel it would be slower and a bit overkill, especially since you'd have to build the system to edit the translations yourself.

您可以将它们存储在数据库中,但我觉得它会更慢并且有点矫枉过正,尤其是因为您必须自己构建系统来编辑翻译。

If only you could actually cache the lookups in a dedicated memory portion in APC, you'd be golden. Sadly, I don't know how.

如果只有你可以在APC的专用内存部分中实际缓存查找,那么你就是金色的。可悲的是,我不知道怎么做。

#2


5  

For those that are interested, it seems full support for locales and i18n in PHP is finally starting to take place.

对于那些感兴趣的人来说,似乎完全支持语言环境,PHP中的i18n终于开始发生了。

// Set the current locale to the one the user agent wants
$locale = Locale::acceptFromHttp(getenv('HTTP_ACCEPT_LANGUAGE'));

// Default Locale
Locale::setDefault($locale);
setlocale(LC_ALL, $locale . '.UTF-8');

// Default timezone of server
date_default_timezone_set('UTC');

// iconv encoding
iconv_set_encoding("internal_encoding", "UTF-8");

// multibyte encoding
mb_internal_encoding('UTF-8');

There are several things that need to be condered and detecting the timezone/locale and then using it to correctly parse and display input and output is important. There is a PHP I18N library that was just released which contains lookup tables for much of this information.

有几件事情需要进行修改并检测时区/区域设置,然后使用它来正确解析和显示输入和输出非常重要。刚刚发布的PHP I18N库包含大部分信息的查找表。

Processing User input is important to make sure you application has clean, well-formed UTF-8 strings from whatever input the user enters. iconv is great for this.

处理用户输入对于确保应用程序具有来自用户输入的任何输入的干净,格式良好的UTF-8字符串非常重要。 iconv很适合这个。

/**
 * Convert a string from one encoding to another encoding
 * and remove invalid bytes sequences.
 *
 * @param string $string to convert
 * @param string $to encoding you want the string in
 * @param string $from encoding that string is in
 * @return string
 */
function encode($string, $to = 'UTF-8', $from = 'UTF-8')
{
    // ASCII is already valid UTF-8
    if($to == 'UTF-8' AND is_ascii($string))
    {
        return $string;
    }

    // Convert the string
    return @iconv($from, $to . '//TRANSLIT//IGNORE', $string);
}


/**
 * Tests whether a string contains only 7bit ASCII characters.
 *
 * @param string $string to check
 * @return bool
 */
function is_ascii($string)
{
    return ! preg_match('/[^\x00-\x7F]/S', $string);
}

Then just run the input through these functions.

然后只需通过这些函数运行输入。

$utf8_string = normalizer_normalize(encode($_POST['text']), Normalizer::FORM_C);

Translations

As Andre said, It seems gettext is the smart default choice for writing applications that can be translated.

正如安德烈所说,似乎gettext是编写可翻译应用程序的明智默认选择。

  1. Gettext uses a binary protocol that is quite quick.
  2. Gettext使用非常快速的二进制协议。
  3. The gettext implementation is usually simpler as it only requires _('Text to translate')
  4. gettext实现通常更简单,因为它只需要_('Text to translate')
  5. Existing tools for translators to use and they're proven to work well.
  6. 现有的翻译人员使用的工具,并证明它们运作良好。

When you reach facebook size then you can work on implementing RAM-cached, alternative methods like the one I mentioned in the question. However, nothing beats "simple, fast, and works" for most projects.

当你达到facebook大小时,你就可以开始实现RAM缓存的替代方法,就像我在问题中提到的那样。然而,对于大多数项目来说,没有什么比“简单,快速和有效”更胜一筹。

However, there are also addition things that gettext cannot handle. Things like displaying dates, money, and numbers. For those you need the INTL extionsion.

但是,还有gettext无法处理的事情。比如显示日期,金钱和数字。对于那些你需要INTL extionsion。

/**
 * Return an IntlDateFormatter object using the current system locale
 *
 * @param string $locale string
 * @param integer $datetype IntlDateFormatter constant
 * @param integer $timetype IntlDateFormatter constant
 * @param string $timezone Time zone ID, default is system default
 * @return IntlDateFormatter
 */
function __date($locale = NULL, $datetype = IntlDateFormatter::MEDIUM, $timetype = IntlDateFormatter::SHORT, $timezone = NULL)
{
    return new IntlDateFormatter($locale ?: setlocale(LC_ALL, 0), $datetype, $timetype, $timezone);
}

$now = new DateTime();
print __date()->format($now);
$time = __date()->parse($string);

In addition you can use strftime to parse dates taking the current locale into consideration.

此外,您可以使用strftime来解析考虑当前区域设置的日期。

Sometimes you need the values for numbers and dates inserted correctly into locale messages

有时您需要将数字和日期的值正确插入区域设置消息中

/**
 * Format the given string using the current system locale
 * Basically, it's sprintf on i18n steroids.
 *
 * @param string $string to parse
 * @param array $params to insert
 * @return string
 */
function __($string, array $params = NULL)
{
    return msgfmt_format_message(setlocale(LC_ALL, 0), $string, $params);
}

// Multiple choices (can also just use ngettext)
print __(_("{1,choice,0#no errors|1#single error|1<{1, number} errors}"), array(4));

// Show time in the correct way
print __(_("It is now {0,time,medium}), time());

See the ICU format details for more information.

有关详细信息,请参阅ICU格式详细信息。

Database

Make sure your connection to the database is using the correct charset so that nothing gets currupted on storage.

确保您与数据库的连接使用正确的字符集,以便在存储时不会出现任何问题。

String Functions

You need to understand the difference between the string, mb_string, and grapheme functions.

您需要了解string,mb_string和grapheme函数之间的区别。

// 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) normalization form "D"
$char_a_ring_nfd = "a\xCC\x8A";

var_dump(grapheme_strlen($char_a_ring_nfd));
var_dump(mb_strlen($char_a_ring_nfd));
var_dump(strlen($char_a_ring_nfd));

// 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_A_ring = "\xC3\x85";

var_dump(grapheme_strlen($char_A_ring));
var_dump(mb_strlen($char_A_ring));
var_dump(strlen($char_A_ring));

Domain name TLD's

The IDN functions from the INTL library are a big help processing non-ascii domain names.

INTL库中的IDN函数是处理非ascii域名的重要帮助。

#3


3  

There are a number of other SO questions and answers similar to this one. I suggest you search and read them as well.

还有许多与此类似的其他SO问题和答案。我建议你搜索并阅读它们。

Advice? Use an existing solution like gettext or xliff as it will save you lot's of grief when you hit all the translation edge cases such as right to left text, date formats, different text volumes, French is 30% more verbose than English for example that screw up formatting etc. Even better advice Don't do it. If the users want to translate they will make a clone and translate it. Because Localisation is more about look and feel and using colloquial language this is usually what happens. Again giving and example Anglo-Saxon culture likes cool web colours and san-serif type faces. Hispanic culture like bright colours and Serif/Cursive types. Which to cater for you would need different layouts per language.

建议吗?使用像gettext或xliff这样的现有解决方案,因为当您点击所有翻译边缘情况(例如从右到左文本,日期格式,不同的文本卷)时,它会为您节省很多悲伤,法语比英语更加冗长30%格式化等。更好的建议不要这样做。如果用户想要翻译,他们将进行克隆并进行翻译。因为本地化更多地是关于外观和使用口语,所以通常会发生这种情况。再给予和示例盎格鲁 - 撒克逊文化喜欢酷炫的网页颜色和san-serif类型的面孔。西班牙文化,如鲜艳的色彩和衬线/草书类型。为了满足您的需要,每种语言需要不同的布局。

Zend actually cater for the following adapters for Zend_Translate and it is a useful list.

Zend实际上为Zend_Translate提供了以下适配器,它是一个有用的列表。

  • Array:- Use PHP arrays for Small pages; simplest usage; only for programmers
  • 数组: - 将PHP数组用于小页面;最简单的用法;仅适用于程序员
  • Csv:- Use comma separated (.csv/.txt) files for Simple text file format; fast; possible problems with unicode characters
  • Csv: - 使用逗号分隔(.csv / .txt)文件作为简单文本文件格式;快速; unicode字符可能存在的问题
  • Gettext:- Use binary gettext (*.mo) files for GNU standard for linux; thread-safe; needs tools for translation
  • Gettext: - 为Linux的GNU标准使用二进制gettext(* .mo)文件;线程安全的;需要翻译工具
  • Ini:- Use simple INI (*.ini) files for Simple text file format; fast; possible problems with unicode characters
  • Ini: - 将简单的INI(* .ini)文件用于简单文本文件格式;快速; unicode字符可能存在的问题
  • Tbx:- Use termbase exchange (.tbx/.xml) files for Industry standard for inter application terminology strings; XML format
  • Tbx: - 对于应用程序间术语字符串,使用术语库交换(.tbx / .xml)文件作为行业标准; XML格式
  • Tmx:- Use tmx (.tmx/.xml) files for Industry standard for inter application translation; XML format; human readable
  • Tmx: - 使用行业标准的tmx(.tmx / .xml)文件进行应用程序间转换; XML格式;人类可读
  • Qt:- Use qt linguist (*.ts) files for Cross platform application framework; XML format; human readable
  • Qt: - 将qt语言学家(* .ts)文件用于跨平台应用程序框架; XML格式;人类可读
  • Xliff:- Use xliff (.xliff/.xml) files for A simpler format as TMX but related to it; XML format; human readable
  • Xliff: - 使用xliff(.xliff / .xml)文件作为TMX的简单格式但与之相关; XML格式;人类可读
  • XmlTm:- Use xmltm (*.xml) files for Industry standard for XML document translation memory; XML format; human readable
  • XmlTm: - 使用xmltm(* .xml)文件作为XML文档翻译内存的行业标准; XML格式;人类可读
  • Others:- *.sql for Different other adapters may be implemented in the future
  • 其他: - * .sql用于不同的其他适配器将来可能会实现

#4


3  

I'm using the ICU stuff in my framework and really finding it simple and useful to use. My system is XML-based with XPath queries and not a database as you're suggesting to use. I've not found this approach to be inefficient. I played around with Resource bundles too when researching techniques but found them quite complicated to implement.

我在我的框架中使用ICU的东西,并且发现它很简单,也很有用。我的系统是基于XML的XPath查询,而不是您建议使用的数据库。我没有发现这种方法效率低下。在研究技术时我也玩过Resource bundle,但发现它们实现起来相当复杂。

The Locale functionality is a god send. You can do so much more easily:

Locale功能是神派。你可以更容易地做到这一点:

// Available translations
$languages = array('en', 'fr', 'de');

// The language the user wants
$preference = (isset($_COOKIE['lang'])) ?
    $_COOKIE['lang'] : ((isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) ?
        Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']) : '');

// Match preferred language to those available, defaulting to generic English
$locale = Locale::lookup($languages, $preference, false, 'en');

// Construct path to dictionary file
$file = $dir . '/' . $locale . '.xsl';

// Check that dictionary file is readable
if (!file_exists($file) || !is_readable($file)) {
    throw new RuntimeException('Dictionary could not be loaded');
}

// Load and return dictionary file
$dictionary = simplexml_load_file($file);

I then perform word lookups using a method like this:

然后我使用这样的方法执行单词查找:

$selector = '/i18n/text[@label="' . $word . '"]';
$result = $dictionary->xpath($selector);
$text = array_shift($result);

if ($formatted && isset($text)) {
    return new MessageFormatter($locale, $text);
 }

The bonus for my system is that the template system is XSL-based which means I can use the same translation XML files directly in my templates for simple messages that don't need any i18n formatting.

我的系统的好处是模板系统是基于XSL的,这意味着我可以直接在我的模板中使用相同的翻译XML文件,用于不需要任何i18n格式化的简单消息。

#5


1  

Stick with gettext, you won't find a faster alternative in PHP.

坚持使用gettext,你不会在PHP中找到更快的替代品。

Regarding the how, you can use a database to store your catalog and allow other users to translate the strings using a friendly gui. When the new changes are reviewed/approved, hit a button, compile a new .mo file and deploy.

关于如何,您可以使用数据库来存储您的目录,并允许其他用户使用友好的gui翻译字符串。审核/批准新更改后,单击按钮,编译新的.mo文件并进行部署。

Some resources to get you on track:

一些资源让您走上正轨:

#6


1  

What about csv files (which can be easily edited in many apps) and caching to memcache (wincache, etc.)? This approach works well in magento. All languages phrases in the code are wrapped into __() function, for example

那些csv文件(可以在很多应用程序中轻松编辑)和缓存到memcache(wincache等)?这种方法在magento中运行良好。例如,代码中的所有语言短语都包含在__()函数中

<?php echo $this->__('Some text') ?>

Then, for example before new version release, you run simple script which parses source files, finds all text wrapped into __() and puts into .csv file. You load csv files and cache them to memcache. In __() function you look into your memcache where translations are cached.

然后,例如在新版本发布之前,运行简单脚本来解析源文件,查找包含在__()中的所有文本并放入.csv文件。您加载csv文件并将其缓存到memcache。在__()函数中,您可以查看缓存转换的内存缓存。

#7


0  

In a recent project, we considered using gettext, but it turned out to be easier to just write our own functionality. It really is quite simple: Create a JSON file per locale (e.g. strings.en.json, strings.es.json, etc.), and create a function somewhere called "translate()" or something, and then just call that. That function will determine the current locale (from the URI or a session var or something), and return the localized string.

在最近的一个项目中,我们考虑使用gettext,但结果却更容易编写我们自己的功能。这非常简单:在每个语言环境中创建一个JSON文件(例如strings.en.json,strings.es.json等),然后创建一个名为“translate()”或其他东西的函数,然后调用它。该函数将确定当前的语言环境(来自URI或会话var或其他内容),并返回本地化的字符串。

The only thing to remember is to make sure any HTML you output is encoded in UTF-8, and marked as such in the markup (e.g. in the doctype, etc.)

唯一要记住的是确保输出的任何HTML都以UTF-8编码,并在标记中标记为(例如在doctype等中)

#8


0  

Maybe not really an answer to your question, but maybe you can get some ideas from the Symfony translation component? It looks very good to me, although I must confess I haven't used it myself yet.

也许不是你的问题的答案,但也许你可以从Symfony翻译组件中获得一些想法?它看起来对我很好,虽然我必须承认我还没有用过它。

The documentation for the component can be found at

可以在以下位置找到该组件的文档

http://symfony.com/doc/current/book/translation.html

http://symfony.com/doc/current/book/translation.html

and the code for the component can be found at

并且可以在以下位置找到组件的代码

https://github.com/symfony/Translation.

https://github.com/symfony/Translation。

It should be easy to use the Translation component, because Symfony components are intended to be able to be used as standalone components.

应该很容易使用Translation组件,因为Symfony组件旨在能够用作独立组件。

#9


0  

On another note, one of the things I don't like about gettext is that the text is embedded into the application all over the place. That means that the team responsible for the primary translation (usually English) has to have access to the project source code to make changes in all the places the default statements are placed. It's almost as bad as applications that have SQL spaghetti-code all over.

另一方面,我不喜欢gettext的一个原因是文本被嵌入到整个应用程序中。这意味着负责主要翻译的团队(通常是英语)必须能够访问项目源代码,以便在默认语句的所有位置进行更改。它几乎与遍布SQL意大利面条代码的应用程序一样糟糕。

This isn't actually true. You can have a header file (sorry, ex C programmer), such as:

事实并非如此。你可以有一个头文件(抱歉,ex C程序员),例如:

<?php
define(MSG_404_NOT_FOUND, 'error.404_not_found')
?>

Then whenever you want a message, use _(MSG_404_NOT_FOUND). This is much more flexible than requiring developers to remember the exact syntax of the non-localised message every time they want to spit out a localised version.

然后,只要您需要消息,请使用_(MSG_404_NOT_FOUND)。这比要求开发人员每次想要吐出本地化版本时都要记住非本地化消息的确切语法要灵活得多。

You could go one step further, and generate the header file in a build step, maybe from CSV or database, and cross-reference with the translation to detect missing strings.

您可以更进一步,在构建步骤中生成头文件,可能来自CSV或数据库,并与转换交叉引用以检测丢失的字符串。

#10


0  

have a zend plugin that works very well for this.

有一个适用于此的zend插件。

<?php
/** dependencies **/
require 'Zend/Loader/Autoloader.php';
require 'Zag/Filter/CharConvert.php';

Zend_Loader_Autoloader::getInstance()->setFallbackAutoloader(true);

//filter
$filter = new Zag_Filter_CharConvert(array(
    'replaceWhiteSpace' => '-',
    'locale' => 'en_US',
    'charset'=> 'UTF-8'
));

echo $filter->filter('ééé ááá 90');//eee-aaa-90
echo $filter->filter('óóó 10aáééé');//ooo-10aaeee

if you do not want to use the zend framework can only use the plugin.

如果你不想使用zend框架只能使用插件。

hug!

拥抱!

#1


6  

Gettext uses a binary protocol that is quite quick. Also the gettext implementation is usually simpler as it only requires echo _('Text to translate');. It also has existing tools for translators to use and they're proven to work well.

Gettext使用非常快速的二进制协议。此外,gettext实现通常更简单,因为它只需要echo _('Text to translate');.它还有现有的翻译工具,并且它们被证明可以很好地运行。

You can store them in a database but I feel it would be slower and a bit overkill, especially since you'd have to build the system to edit the translations yourself.

您可以将它们存储在数据库中,但我觉得它会更慢并且有点矫枉过正,尤其是因为您必须自己构建系统来编辑翻译。

If only you could actually cache the lookups in a dedicated memory portion in APC, you'd be golden. Sadly, I don't know how.

如果只有你可以在APC的专用内存部分中实际缓存查找,那么你就是金色的。可悲的是,我不知道怎么做。

#2


5  

For those that are interested, it seems full support for locales and i18n in PHP is finally starting to take place.

对于那些感兴趣的人来说,似乎完全支持语言环境,PHP中的i18n终于开始发生了。

// Set the current locale to the one the user agent wants
$locale = Locale::acceptFromHttp(getenv('HTTP_ACCEPT_LANGUAGE'));

// Default Locale
Locale::setDefault($locale);
setlocale(LC_ALL, $locale . '.UTF-8');

// Default timezone of server
date_default_timezone_set('UTC');

// iconv encoding
iconv_set_encoding("internal_encoding", "UTF-8");

// multibyte encoding
mb_internal_encoding('UTF-8');

There are several things that need to be condered and detecting the timezone/locale and then using it to correctly parse and display input and output is important. There is a PHP I18N library that was just released which contains lookup tables for much of this information.

有几件事情需要进行修改并检测时区/区域设置,然后使用它来正确解析和显示输入和输出非常重要。刚刚发布的PHP I18N库包含大部分信息的查找表。

Processing User input is important to make sure you application has clean, well-formed UTF-8 strings from whatever input the user enters. iconv is great for this.

处理用户输入对于确保应用程序具有来自用户输入的任何输入的干净,格式良好的UTF-8字符串非常重要。 iconv很适合这个。

/**
 * Convert a string from one encoding to another encoding
 * and remove invalid bytes sequences.
 *
 * @param string $string to convert
 * @param string $to encoding you want the string in
 * @param string $from encoding that string is in
 * @return string
 */
function encode($string, $to = 'UTF-8', $from = 'UTF-8')
{
    // ASCII is already valid UTF-8
    if($to == 'UTF-8' AND is_ascii($string))
    {
        return $string;
    }

    // Convert the string
    return @iconv($from, $to . '//TRANSLIT//IGNORE', $string);
}


/**
 * Tests whether a string contains only 7bit ASCII characters.
 *
 * @param string $string to check
 * @return bool
 */
function is_ascii($string)
{
    return ! preg_match('/[^\x00-\x7F]/S', $string);
}

Then just run the input through these functions.

然后只需通过这些函数运行输入。

$utf8_string = normalizer_normalize(encode($_POST['text']), Normalizer::FORM_C);

Translations

As Andre said, It seems gettext is the smart default choice for writing applications that can be translated.

正如安德烈所说,似乎gettext是编写可翻译应用程序的明智默认选择。

  1. Gettext uses a binary protocol that is quite quick.
  2. Gettext使用非常快速的二进制协议。
  3. The gettext implementation is usually simpler as it only requires _('Text to translate')
  4. gettext实现通常更简单,因为它只需要_('Text to translate')
  5. Existing tools for translators to use and they're proven to work well.
  6. 现有的翻译人员使用的工具,并证明它们运作良好。

When you reach facebook size then you can work on implementing RAM-cached, alternative methods like the one I mentioned in the question. However, nothing beats "simple, fast, and works" for most projects.

当你达到facebook大小时,你就可以开始实现RAM缓存的替代方法,就像我在问题中提到的那样。然而,对于大多数项目来说,没有什么比“简单,快速和有效”更胜一筹。

However, there are also addition things that gettext cannot handle. Things like displaying dates, money, and numbers. For those you need the INTL extionsion.

但是,还有gettext无法处理的事情。比如显示日期,金钱和数字。对于那些你需要INTL extionsion。

/**
 * Return an IntlDateFormatter object using the current system locale
 *
 * @param string $locale string
 * @param integer $datetype IntlDateFormatter constant
 * @param integer $timetype IntlDateFormatter constant
 * @param string $timezone Time zone ID, default is system default
 * @return IntlDateFormatter
 */
function __date($locale = NULL, $datetype = IntlDateFormatter::MEDIUM, $timetype = IntlDateFormatter::SHORT, $timezone = NULL)
{
    return new IntlDateFormatter($locale ?: setlocale(LC_ALL, 0), $datetype, $timetype, $timezone);
}

$now = new DateTime();
print __date()->format($now);
$time = __date()->parse($string);

In addition you can use strftime to parse dates taking the current locale into consideration.

此外,您可以使用strftime来解析考虑当前区域设置的日期。

Sometimes you need the values for numbers and dates inserted correctly into locale messages

有时您需要将数字和日期的值正确插入区域设置消息中

/**
 * Format the given string using the current system locale
 * Basically, it's sprintf on i18n steroids.
 *
 * @param string $string to parse
 * @param array $params to insert
 * @return string
 */
function __($string, array $params = NULL)
{
    return msgfmt_format_message(setlocale(LC_ALL, 0), $string, $params);
}

// Multiple choices (can also just use ngettext)
print __(_("{1,choice,0#no errors|1#single error|1<{1, number} errors}"), array(4));

// Show time in the correct way
print __(_("It is now {0,time,medium}), time());

See the ICU format details for more information.

有关详细信息,请参阅ICU格式详细信息。

Database

Make sure your connection to the database is using the correct charset so that nothing gets currupted on storage.

确保您与数据库的连接使用正确的字符集,以便在存储时不会出现任何问题。

String Functions

You need to understand the difference between the string, mb_string, and grapheme functions.

您需要了解string,mb_string和grapheme函数之间的区别。

// 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) normalization form "D"
$char_a_ring_nfd = "a\xCC\x8A";

var_dump(grapheme_strlen($char_a_ring_nfd));
var_dump(mb_strlen($char_a_ring_nfd));
var_dump(strlen($char_a_ring_nfd));

// 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_A_ring = "\xC3\x85";

var_dump(grapheme_strlen($char_A_ring));
var_dump(mb_strlen($char_A_ring));
var_dump(strlen($char_A_ring));

Domain name TLD's

The IDN functions from the INTL library are a big help processing non-ascii domain names.

INTL库中的IDN函数是处理非ascii域名的重要帮助。

#3


3  

There are a number of other SO questions and answers similar to this one. I suggest you search and read them as well.

还有许多与此类似的其他SO问题和答案。我建议你搜索并阅读它们。

Advice? Use an existing solution like gettext or xliff as it will save you lot's of grief when you hit all the translation edge cases such as right to left text, date formats, different text volumes, French is 30% more verbose than English for example that screw up formatting etc. Even better advice Don't do it. If the users want to translate they will make a clone and translate it. Because Localisation is more about look and feel and using colloquial language this is usually what happens. Again giving and example Anglo-Saxon culture likes cool web colours and san-serif type faces. Hispanic culture like bright colours and Serif/Cursive types. Which to cater for you would need different layouts per language.

建议吗?使用像gettext或xliff这样的现有解决方案,因为当您点击所有翻译边缘情况(例如从右到左文本,日期格式,不同的文本卷)时,它会为您节省很多悲伤,法语比英语更加冗长30%格式化等。更好的建议不要这样做。如果用户想要翻译,他们将进行克隆并进行翻译。因为本地化更多地是关于外观和使用口语,所以通常会发生这种情况。再给予和示例盎格鲁 - 撒克逊文化喜欢酷炫的网页颜色和san-serif类型的面孔。西班牙文化,如鲜艳的色彩和衬线/草书类型。为了满足您的需要,每种语言需要不同的布局。

Zend actually cater for the following adapters for Zend_Translate and it is a useful list.

Zend实际上为Zend_Translate提供了以下适配器,它是一个有用的列表。

  • Array:- Use PHP arrays for Small pages; simplest usage; only for programmers
  • 数组: - 将PHP数组用于小页面;最简单的用法;仅适用于程序员
  • Csv:- Use comma separated (.csv/.txt) files for Simple text file format; fast; possible problems with unicode characters
  • Csv: - 使用逗号分隔(.csv / .txt)文件作为简单文本文件格式;快速; unicode字符可能存在的问题
  • Gettext:- Use binary gettext (*.mo) files for GNU standard for linux; thread-safe; needs tools for translation
  • Gettext: - 为Linux的GNU标准使用二进制gettext(* .mo)文件;线程安全的;需要翻译工具
  • Ini:- Use simple INI (*.ini) files for Simple text file format; fast; possible problems with unicode characters
  • Ini: - 将简单的INI(* .ini)文件用于简单文本文件格式;快速; unicode字符可能存在的问题
  • Tbx:- Use termbase exchange (.tbx/.xml) files for Industry standard for inter application terminology strings; XML format
  • Tbx: - 对于应用程序间术语字符串,使用术语库交换(.tbx / .xml)文件作为行业标准; XML格式
  • Tmx:- Use tmx (.tmx/.xml) files for Industry standard for inter application translation; XML format; human readable
  • Tmx: - 使用行业标准的tmx(.tmx / .xml)文件进行应用程序间转换; XML格式;人类可读
  • Qt:- Use qt linguist (*.ts) files for Cross platform application framework; XML format; human readable
  • Qt: - 将qt语言学家(* .ts)文件用于跨平台应用程序框架; XML格式;人类可读
  • Xliff:- Use xliff (.xliff/.xml) files for A simpler format as TMX but related to it; XML format; human readable
  • Xliff: - 使用xliff(.xliff / .xml)文件作为TMX的简单格式但与之相关; XML格式;人类可读
  • XmlTm:- Use xmltm (*.xml) files for Industry standard for XML document translation memory; XML format; human readable
  • XmlTm: - 使用xmltm(* .xml)文件作为XML文档翻译内存的行业标准; XML格式;人类可读
  • Others:- *.sql for Different other adapters may be implemented in the future
  • 其他: - * .sql用于不同的其他适配器将来可能会实现

#4


3  

I'm using the ICU stuff in my framework and really finding it simple and useful to use. My system is XML-based with XPath queries and not a database as you're suggesting to use. I've not found this approach to be inefficient. I played around with Resource bundles too when researching techniques but found them quite complicated to implement.

我在我的框架中使用ICU的东西,并且发现它很简单,也很有用。我的系统是基于XML的XPath查询,而不是您建议使用的数据库。我没有发现这种方法效率低下。在研究技术时我也玩过Resource bundle,但发现它们实现起来相当复杂。

The Locale functionality is a god send. You can do so much more easily:

Locale功能是神派。你可以更容易地做到这一点:

// Available translations
$languages = array('en', 'fr', 'de');

// The language the user wants
$preference = (isset($_COOKIE['lang'])) ?
    $_COOKIE['lang'] : ((isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) ?
        Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']) : '');

// Match preferred language to those available, defaulting to generic English
$locale = Locale::lookup($languages, $preference, false, 'en');

// Construct path to dictionary file
$file = $dir . '/' . $locale . '.xsl';

// Check that dictionary file is readable
if (!file_exists($file) || !is_readable($file)) {
    throw new RuntimeException('Dictionary could not be loaded');
}

// Load and return dictionary file
$dictionary = simplexml_load_file($file);

I then perform word lookups using a method like this:

然后我使用这样的方法执行单词查找:

$selector = '/i18n/text[@label="' . $word . '"]';
$result = $dictionary->xpath($selector);
$text = array_shift($result);

if ($formatted && isset($text)) {
    return new MessageFormatter($locale, $text);
 }

The bonus for my system is that the template system is XSL-based which means I can use the same translation XML files directly in my templates for simple messages that don't need any i18n formatting.

我的系统的好处是模板系统是基于XSL的,这意味着我可以直接在我的模板中使用相同的翻译XML文件,用于不需要任何i18n格式化的简单消息。

#5


1  

Stick with gettext, you won't find a faster alternative in PHP.

坚持使用gettext,你不会在PHP中找到更快的替代品。

Regarding the how, you can use a database to store your catalog and allow other users to translate the strings using a friendly gui. When the new changes are reviewed/approved, hit a button, compile a new .mo file and deploy.

关于如何,您可以使用数据库来存储您的目录,并允许其他用户使用友好的gui翻译字符串。审核/批准新更改后,单击按钮,编译新的.mo文件并进行部署。

Some resources to get you on track:

一些资源让您走上正轨:

#6


1  

What about csv files (which can be easily edited in many apps) and caching to memcache (wincache, etc.)? This approach works well in magento. All languages phrases in the code are wrapped into __() function, for example

那些csv文件(可以在很多应用程序中轻松编辑)和缓存到memcache(wincache等)?这种方法在magento中运行良好。例如,代码中的所有语言短语都包含在__()函数中

<?php echo $this->__('Some text') ?>

Then, for example before new version release, you run simple script which parses source files, finds all text wrapped into __() and puts into .csv file. You load csv files and cache them to memcache. In __() function you look into your memcache where translations are cached.

然后,例如在新版本发布之前,运行简单脚本来解析源文件,查找包含在__()中的所有文本并放入.csv文件。您加载csv文件并将其缓存到memcache。在__()函数中,您可以查看缓存转换的内存缓存。

#7


0  

In a recent project, we considered using gettext, but it turned out to be easier to just write our own functionality. It really is quite simple: Create a JSON file per locale (e.g. strings.en.json, strings.es.json, etc.), and create a function somewhere called "translate()" or something, and then just call that. That function will determine the current locale (from the URI or a session var or something), and return the localized string.

在最近的一个项目中,我们考虑使用gettext,但结果却更容易编写我们自己的功能。这非常简单:在每个语言环境中创建一个JSON文件(例如strings.en.json,strings.es.json等),然后创建一个名为“translate()”或其他东西的函数,然后调用它。该函数将确定当前的语言环境(来自URI或会话var或其他内容),并返回本地化的字符串。

The only thing to remember is to make sure any HTML you output is encoded in UTF-8, and marked as such in the markup (e.g. in the doctype, etc.)

唯一要记住的是确保输出的任何HTML都以UTF-8编码,并在标记中标记为(例如在doctype等中)

#8


0  

Maybe not really an answer to your question, but maybe you can get some ideas from the Symfony translation component? It looks very good to me, although I must confess I haven't used it myself yet.

也许不是你的问题的答案,但也许你可以从Symfony翻译组件中获得一些想法?它看起来对我很好,虽然我必须承认我还没有用过它。

The documentation for the component can be found at

可以在以下位置找到该组件的文档

http://symfony.com/doc/current/book/translation.html

http://symfony.com/doc/current/book/translation.html

and the code for the component can be found at

并且可以在以下位置找到组件的代码

https://github.com/symfony/Translation.

https://github.com/symfony/Translation。

It should be easy to use the Translation component, because Symfony components are intended to be able to be used as standalone components.

应该很容易使用Translation组件,因为Symfony组件旨在能够用作独立组件。

#9


0  

On another note, one of the things I don't like about gettext is that the text is embedded into the application all over the place. That means that the team responsible for the primary translation (usually English) has to have access to the project source code to make changes in all the places the default statements are placed. It's almost as bad as applications that have SQL spaghetti-code all over.

另一方面,我不喜欢gettext的一个原因是文本被嵌入到整个应用程序中。这意味着负责主要翻译的团队(通常是英语)必须能够访问项目源代码,以便在默认语句的所有位置进行更改。它几乎与遍布SQL意大利面条代码的应用程序一样糟糕。

This isn't actually true. You can have a header file (sorry, ex C programmer), such as:

事实并非如此。你可以有一个头文件(抱歉,ex C程序员),例如:

<?php
define(MSG_404_NOT_FOUND, 'error.404_not_found')
?>

Then whenever you want a message, use _(MSG_404_NOT_FOUND). This is much more flexible than requiring developers to remember the exact syntax of the non-localised message every time they want to spit out a localised version.

然后,只要您需要消息,请使用_(MSG_404_NOT_FOUND)。这比要求开发人员每次想要吐出本地化版本时都要记住非本地化消息的确切语法要灵活得多。

You could go one step further, and generate the header file in a build step, maybe from CSV or database, and cross-reference with the translation to detect missing strings.

您可以更进一步,在构建步骤中生成头文件,可能来自CSV或数据库,并与转换交叉引用以检测丢失的字符串。

#10


0  

have a zend plugin that works very well for this.

有一个适用于此的zend插件。

<?php
/** dependencies **/
require 'Zend/Loader/Autoloader.php';
require 'Zag/Filter/CharConvert.php';

Zend_Loader_Autoloader::getInstance()->setFallbackAutoloader(true);

//filter
$filter = new Zag_Filter_CharConvert(array(
    'replaceWhiteSpace' => '-',
    'locale' => 'en_US',
    'charset'=> 'UTF-8'
));

echo $filter->filter('ééé ááá 90');//eee-aaa-90
echo $filter->filter('óóó 10aáééé');//ooo-10aaeee

if you do not want to use the zend framework can only use the plugin.

如果你不想使用zend框架只能使用插件。

hug!

拥抱!