本地化动态生成的数据

时间:2022-07-14 19:17:12

This was a hard question for me to summarize so we may need to edit this a bit.

这对我来说是一个难以总结的问题所以我们可能需要稍微编辑一下。

Background

About four years ago, we had to translate our asp.net application for our clients in Mexico. Extensibility and scalability were not that much of a concern at the time (oh yes, I just said those dreadful words) because we only have U.S. and Mexican customers.

大约四年前,我们不得不为我们在墨西哥的客户翻译我们的asp.net应用程序。可扩展性和可扩展性在当时并不是那么令人担忧(哦,是的,我只是说了那些可怕的话),因为我们只有美国和墨西哥的客户。

Rather than use resource files, we replaced every single piece of static text in our application with some type of server control (asp.net label for example). We store each and every English word in a SQL database. We have added the ability to translate the English text into another language and also can add cultural overrides. For example, hello can be translated to ¡hola! in one language and overridden to ¡bueno! in a different culture. The business has full control over these translations because will built management utilities for them to control everything. The translation kicks in when we detect that the user has a browser culture other than en-us. Every form descends from a base form that iterates through each server control and executes a translation (translation data is stored as a datatable in an application variable for a culture). I'm still amazed at how fast the control iteration is.

我们使用某种类型的服务器控件(例如asp.net标签)替换了应用程序中的每一段静态文本,而不是使用资源文件。我们将每个英文单词存储在SQL数据库中。我们添加了将英文文本翻译成另一种语言的功能,还可以添加文化覆盖。例如,你好可以翻译成¡在一种语言中被覆盖到¡bueno!在不同的文化中。企业可以完全控制这些翻译,因为它们将为他们构建管理实用程序来控制所有内容。当我们检测到用户拥有除en-us之外的浏览器文化时,翻译就会启动。每个表单都从一个基本表单下降,该表单遍历每个服务器控件并执行转换(转换数据作为数据表存储在文化的应用程序变量中)。我仍然对控制迭代的速度感到惊讶。

The problem

The business is very happy with how the translations work. In addition to the static content that I mentioned above, the business now wants to have certain data translated as well. System notes are a good example of a translation they want. Example "Sent Letter #XXXX to Customer" - the business wants the "Sent Letter to Customer" text translated based on their browser culture.

企业对翻译的工作方式非常满意。除了我上面提到的静态内容之外,企业现在也想要翻译某些数据。系统说明是他们想要的翻译的一个很好的例子。示例“向客户发送信件#XXXX” - 企业希望根据其浏览器文化翻译“发送给客户的信函”文本。

I have read a couple of other posts on SO that talk about localization but they don't address my problem. How do you translate a phrase that is dynamically generated? I could easily read the English text and translate "Sent", "Letter", "to" and "Customer", but I guarantee that it will look stupid to the end user because it's a phrase. The dynamic part of the system-generated note would screw up any look-ups that we perform on the phrase if we stored the phrase in English, less the dynamic text.

我已经阅读了关于SO的其他几篇关于本地化的文章,但它们没有解决我的问题。如何翻译动态生成的短语?我可以轻松阅读英文文本并翻译“已发送”,“信函”,“到”和“客户”,但我保证它对最终用户来说看起来很愚蠢,因为它是一个短语。系统生成的音符的动态部分会搞砸我们对短语执行的任何查找,如果我们将短语存储为英语,而不是动态文本。

One thought I had... We don't have a table of system generated note types. I suppose we could create one that had placeholders for dynamic data and the translation engine would ignore the placeholder markers. The problem with this approach is that our SQL server database is a replication of an old pick database and we don't really know all the types of system generated phrases (They are deep in the pic code base, in subroutines, control files, etc.). Things like notes, ticklers, and payment rejection reasons are all stored differently. Trying to normalize this data has proven difficult. It would be a huge effort to go back and identify and change every pick program that generated a message.

有人以为我有...我们没有系统生成的笔记类型表。我想我们可以创建一个具有动态数据占位符的文件,翻译引擎会忽略占位符标记。这种方法的问题是我们的SQL服务器数据库是一个旧的pick数据库的复制,我们并不真正知道所有类型的系统生成的短语(它们深入pic代码库,子程序,控制文件等)。注释,计票器和付款拒绝等原因都以不同方式存储。试图将这些数据标准化已经证明是困难的。返回并识别和更改生成消息的每个选择程序将是一项巨大的努力。

This question is very close; but I'm not dealing with just system-generated status messages but rather an infinite number of phrases and types of phrases with no central generation mechanism.

这个问题非常接近;但我不仅仅处理系统生成的状态消息,而是处理无数个短语和短语类型而没有*生成机制。

Any ideas?

4 个解决方案

#1


The lack of a "bottleneck" -- what you identify as the (missing) "central generation mechanism" -- is the architectural problem in this situation. Ideally, rearchitecting to put such a bottleneck in place (so you can keep using your general approach with a database of culture-appropriate renditions of messages, just with "placeholders" for e.g. the #XXXX in your example) would be best.

缺乏“瓶颈” - 你认为是(缺失的)“*发电机制” - 是这种情况下的架构问题。理想情况下,重新架构以实现这样的瓶颈(因此您可以继续使用您的一般方法与文化适当的消息再现数据库,只需使用“占位符”,例如您的示例中的#XXXX)将是最好的。

If that's just unfeasible, you can place the "bottleneck" at the other end of the pipe -- when a message is about to be emitted. At that point, or few points, you need to try and match the (English) string that's about to be emitted with a series of well-crafted regular expressions (with "placeholders" typically like (.*?)...) and thereby identify the appropriate key for the DB lookup. Yes, that still is a lot of work, but at least it should be feasible without the issues you mention wrt old translated pick code.

如果那是不可行的,你可以将“瓶颈”放在管道的另一端 - 当一条消息即将发出时。在这一点或几点,您需要尝试匹配即将发出的(英语)字符串与一系列精心设计的正则表达式(“占位符”通常类似于(。*?)...)和从而识别DB查找的适当密钥。是的,这仍然是很多工作,但至少它应该是可行的,没有你提到的旧翻译选择代码的问题。

#2


We use technique you propose with insertion points.

使用插入点提出的技术。

"Sent letter #{0:Letter Num} to Customer {1:Customer Full Name}"

“已发信#{0:字母数字}给客户{1:客户全名}”

Which might be (in reverse Pig Latin, say):

可能是(反过来猪拉丁语,比方说):

"Ustomercay {1:Customer Full Name} asway entsay etterlay #{0:Letter Num}"

“Ustomercay {1:客户全名} asway entsay etterlay#{0:Letter Num}”

Note that this handles cases where the particular target langue reverses the order of insertion etc. It does not handle subtleties like first, second, etc, which have to be handled with application logic/more phrases:

请注意,这会处理特定目标语言反转插入顺序等的情况。它不处理第一,第二等细微之处,必须使用应用程序逻辑/更多短语来处理:

"This is your {0:first, second, third} warning"

“这是你的{0:第一,第二,第三}警告”

#3


In a pinch I suppose you could try something like foisting the job off onto Google if you don't have a translation on hand for a particular phrase, and stashing the translation for later.

在一个紧要关头,我想你可以尝试一些类似的工作,如果你没有手头的特定短语的翻译,并将翻译存储到谷歌。

Stashing the translations for later provides both a data collection point for building a message catalog and a rough (if sometimes laughably wonky) dynamically built starter set of translations. Once you begin the process, track which translations have been reviewed and how frequently each have been hit. Frequently hit machine translations can then be reviewed and refined.

将翻译存储起来以便以后提供用于构建消息目录的数据收集点和用于动态构建的初始翻译集的粗略(如果有时是可笑的不稳定)。开始此过程后,跟踪已审核的翻译以及每个翻译的频率。然后可以审查和改进频繁命中的机器翻译。

#4


Dynamic machine translation is not suitable for a product that you actually expect people to pay money for. The only way to do it is with static templates containing insertion points (as Cade Roux has demonstrated in his answer).

动态机器翻译不适合您实际期望人们付钱的产品。唯一的方法是使用包含插入点的静态模板(如Cade Roux在其答案中所示)。

There's no getting around a thorough refactoring of your code to make this feasible. The alternative is to do nothing with those phrases (which is what you're doing now, and it's working out okay, right?). Usually no translation is better than embarrassingly bad translation.

没有彻底重构您的代码以使其可行。另一种方法是不对这些短语做任何事情(这就是你现在正在做的事情,而且它运作正常,对吧?)。通常没有翻译比令人尴尬的糟糕翻译更好。

#1


The lack of a "bottleneck" -- what you identify as the (missing) "central generation mechanism" -- is the architectural problem in this situation. Ideally, rearchitecting to put such a bottleneck in place (so you can keep using your general approach with a database of culture-appropriate renditions of messages, just with "placeholders" for e.g. the #XXXX in your example) would be best.

缺乏“瓶颈” - 你认为是(缺失的)“*发电机制” - 是这种情况下的架构问题。理想情况下,重新架构以实现这样的瓶颈(因此您可以继续使用您的一般方法与文化适当的消息再现数据库,只需使用“占位符”,例如您的示例中的#XXXX)将是最好的。

If that's just unfeasible, you can place the "bottleneck" at the other end of the pipe -- when a message is about to be emitted. At that point, or few points, you need to try and match the (English) string that's about to be emitted with a series of well-crafted regular expressions (with "placeholders" typically like (.*?)...) and thereby identify the appropriate key for the DB lookup. Yes, that still is a lot of work, but at least it should be feasible without the issues you mention wrt old translated pick code.

如果那是不可行的,你可以将“瓶颈”放在管道的另一端 - 当一条消息即将发出时。在这一点或几点,您需要尝试匹配即将发出的(英语)字符串与一系列精心设计的正则表达式(“占位符”通常类似于(。*?)...)和从而识别DB查找的适当密钥。是的,这仍然是很多工作,但至少它应该是可行的,没有你提到的旧翻译选择代码的问题。

#2


We use technique you propose with insertion points.

使用插入点提出的技术。

"Sent letter #{0:Letter Num} to Customer {1:Customer Full Name}"

“已发信#{0:字母数字}给客户{1:客户全名}”

Which might be (in reverse Pig Latin, say):

可能是(反过来猪拉丁语,比方说):

"Ustomercay {1:Customer Full Name} asway entsay etterlay #{0:Letter Num}"

“Ustomercay {1:客户全名} asway entsay etterlay#{0:Letter Num}”

Note that this handles cases where the particular target langue reverses the order of insertion etc. It does not handle subtleties like first, second, etc, which have to be handled with application logic/more phrases:

请注意,这会处理特定目标语言反转插入顺序等的情况。它不处理第一,第二等细微之处,必须使用应用程序逻辑/更多短语来处理:

"This is your {0:first, second, third} warning"

“这是你的{0:第一,第二,第三}警告”

#3


In a pinch I suppose you could try something like foisting the job off onto Google if you don't have a translation on hand for a particular phrase, and stashing the translation for later.

在一个紧要关头,我想你可以尝试一些类似的工作,如果你没有手头的特定短语的翻译,并将翻译存储到谷歌。

Stashing the translations for later provides both a data collection point for building a message catalog and a rough (if sometimes laughably wonky) dynamically built starter set of translations. Once you begin the process, track which translations have been reviewed and how frequently each have been hit. Frequently hit machine translations can then be reviewed and refined.

将翻译存储起来以便以后提供用于构建消息目录的数据收集点和用于动态构建的初始翻译集的粗略(如果有时是可笑的不稳定)。开始此过程后,跟踪已审核的翻译以及每个翻译的频率。然后可以审查和改进频繁命中的机器翻译。

#4


Dynamic machine translation is not suitable for a product that you actually expect people to pay money for. The only way to do it is with static templates containing insertion points (as Cade Roux has demonstrated in his answer).

动态机器翻译不适合您实际期望人们付钱的产品。唯一的方法是使用包含插入点的静态模板(如Cade Roux在其答案中所示)。

There's no getting around a thorough refactoring of your code to make this feasible. The alternative is to do nothing with those phrases (which is what you're doing now, and it's working out okay, right?). Usually no translation is better than embarrassingly bad translation.

没有彻底重构您的代码以使其可行。另一种方法是不对这些短语做任何事情(这就是你现在正在做的事情,而且它运作正常,对吧?)。通常没有翻译比令人尴尬的糟糕翻译更好。