
时间:2022-07-14 19:17:12

This was a hard question for me to summarize so we may need to edit this a bit.



About four years ago, we had to translate our asp.net application for our clients in Mexico. Extensibility and scalability were not that much of a concern at the time (oh yes, I just said those dreadful words) because we only have U.S. and Mexican customers.


Rather than use resource files, we replaced every single piece of static text in our application with some type of server control (asp.net label for example). We store each and every English word in a SQL database. We have added the ability to translate the English text into another language and also can add cultural overrides. For example, hello can be translated to ¡hola! in one language and overridden to ¡bueno! in a different culture. The business has full control over these translations because will built management utilities for them to control everything. The translation kicks in when we detect that the user has a browser culture other than en-us. Every form descends from a base form that iterates through each server control and executes a translation (translation data is stored as a datatable in an application variable for a culture). I'm still amazed at how fast the control iteration is.


The problem

The business is very happy with how the translations work. In addition to the static content that I mentioned above, the business now wants to have certain data translated as well. System notes are a good example of a translation they want. Example "Sent Letter #XXXX to Customer" - the business wants the "Sent Letter to Customer" text translated based on their browser culture.

企业对翻译的工作方式非常满意。除了我上面提到的静态内容之外,企业现在也想要翻译某些数据。系统说明是他们想要的翻译的一个很好的例子。示例“向客户发送信件#XXXX” - 企业希望根据其浏览器文化翻译“发送给客户的信函”文本。

I have read a couple of other posts on SO that talk about localization but they don't address my problem. How do you translate a phrase that is dynamically generated? I could easily read the English text and translate "Sent", "Letter", "to" and "Customer", but I guarantee that it will look stupid to the end user because it's a phrase. The dynamic part of the system-generated note would screw up any look-ups that we perform on the phrase if we stored the phrase in English, less the dynamic text.


One thought I had... We don't have a table of system generated note types. I suppose we could create one that had placeholders for dynamic data and the translation engine would ignore the placeholder markers. The problem with this approach is that our SQL server database is a replication of an old pick database and we don't really know all the types of system generated phrases (They are deep in the pic code base, in subroutines, control files, etc.). Things like notes, ticklers, and payment rejection reasons are all stored differently. Trying to normalize this data has proven difficult. It would be a huge effort to go back and identify and change every pick program that generated a message.


This question is very close; but I'm not dealing with just system-generated status messages but rather an infinite number of phrases and types of phrases with no central generation mechanism.


Any ideas?

4 个解决方案


The lack of a "bottleneck" -- what you identify as the (missing) "central generation mechanism" -- is the architectural problem in this situation. Ideally, rearchitecting to put such a bottleneck in place (so you can keep using your general approach with a database of culture-appropriate renditions of messages, just with "placeholders" for e.g. the #XXXX in your example) would be best.

缺乏“瓶颈” - 你认为是(缺失的)“*发电机制” - 是这种情况下的架构问题。理想情况下,重新架构以实现这样的瓶颈(因此您可以继续使用您的一般方法与文化适当的消息再现数据库,只需使用“占位符”,例如您的示例中的#XXXX)将是最好的。

If that's just unfeasible, you can place the "bottleneck" at the other end of the pipe -- when a message is about to be emitted. At that point, or few points, you need to try and match the (English) string that's about to be emitted with a series of well-crafted regular expressions (with "placeholders" typically like (.*?)...) and thereby identify the appropriate key for the DB lookup. Yes, that still is a lot of work, but at least it should be feasible without the issues you mention wrt old translated pick code.

如果那是不可行的,你可以将“瓶颈”放在管道的另一端 - 当一条消息即将发出时。在这一点或几点,您需要尝试匹配即将发出的(英语)字符串与一系列精心设计的正则表达式(“占位符”通常类似于(。*?)...)和从而识别DB查找的适当密钥。是的,这仍然是很多工作,但至少它应该是可行的,没有你提到的旧翻译选择代码的问题。


We use technique you propose with insertion points.


"Sent letter #{0:Letter Num} to Customer {1:Customer Full Name}"


Which might be (in reverse Pig Latin, say):


"Ustomercay {1:Customer Full Name} asway entsay etterlay #{0:Letter Num}"

“Ustomercay {1:客户全名} asway entsay etterlay#{0:Letter Num}”

Note that this handles cases where the particular target langue reverses the order of insertion etc. It does not handle subtleties like first, second, etc, which have to be handled with application logic/more phrases:


"This is your {0:first, second, third} warning"



In a pinch I suppose you could try something like foisting the job off onto Google if you don't have a translation on hand for a particular phrase, and stashing the translation for later.


Stashing the translations for later provides both a data collection point for building a message catalog and a rough (if sometimes laughably wonky) dynamically built starter set of translations. Once you begin the process, track which translations have been reviewed and how frequently each have been hit. Frequently hit machine translations can then be reviewed and refined.



Dynamic machine translation is not suitable for a product that you actually expect people to pay money for. The only way to do it is with static templates containing insertion points (as Cade Roux has demonstrated in his answer).

动态机器翻译不适合您实际期望人们付钱的产品。唯一的方法是使用包含插入点的静态模板(如Cade Roux在其答案中所示)。

There's no getting around a thorough refactoring of your code to make this feasible. The alternative is to do nothing with those phrases (which is what you're doing now, and it's working out okay, right?). Usually no translation is better than embarrassingly bad translation.



The lack of a "bottleneck" -- what you identify as the (missing) "central generation mechanism" -- is the architectural problem in this situation. Ideally, rearchitecting to put such a bottleneck in place (so you can keep using your general approach with a database of culture-appropriate renditions of messages, just with "placeholders" for e.g. the #XXXX in your example) would be best.

缺乏“瓶颈” - 你认为是(缺失的)“*发电机制” - 是这种情况下的架构问题。理想情况下,重新架构以实现这样的瓶颈(因此您可以继续使用您的一般方法与文化适当的消息再现数据库,只需使用“占位符”,例如您的示例中的#XXXX)将是最好的。

If that's just unfeasible, you can place the "bottleneck" at the other end of the pipe -- when a message is about to be emitted. At that point, or few points, you need to try and match the (English) string that's about to be emitted with a series of well-crafted regular expressions (with "placeholders" typically like (.*?)...) and thereby identify the appropriate key for the DB lookup. Yes, that still is a lot of work, but at least it should be feasible without the issues you mention wrt old translated pick code.

如果那是不可行的,你可以将“瓶颈”放在管道的另一端 - 当一条消息即将发出时。在这一点或几点,您需要尝试匹配即将发出的(英语)字符串与一系列精心设计的正则表达式(“占位符”通常类似于(。*?)...)和从而识别DB查找的适当密钥。是的,这仍然是很多工作,但至少它应该是可行的,没有你提到的旧翻译选择代码的问题。


We use technique you propose with insertion points.


"Sent letter #{0:Letter Num} to Customer {1:Customer Full Name}"


Which might be (in reverse Pig Latin, say):


"Ustomercay {1:Customer Full Name} asway entsay etterlay #{0:Letter Num}"

“Ustomercay {1:客户全名} asway entsay etterlay#{0:Letter Num}”

Note that this handles cases where the particular target langue reverses the order of insertion etc. It does not handle subtleties like first, second, etc, which have to be handled with application logic/more phrases:


"This is your {0:first, second, third} warning"



In a pinch I suppose you could try something like foisting the job off onto Google if you don't have a translation on hand for a particular phrase, and stashing the translation for later.


Stashing the translations for later provides both a data collection point for building a message catalog and a rough (if sometimes laughably wonky) dynamically built starter set of translations. Once you begin the process, track which translations have been reviewed and how frequently each have been hit. Frequently hit machine translations can then be reviewed and refined.



Dynamic machine translation is not suitable for a product that you actually expect people to pay money for. The only way to do it is with static templates containing insertion points (as Cade Roux has demonstrated in his answer).

动态机器翻译不适合您实际期望人们付钱的产品。唯一的方法是使用包含插入点的静态模板(如Cade Roux在其答案中所示)。

There's no getting around a thorough refactoring of your code to make this feasible. The alternative is to do nothing with those phrases (which is what you're doing now, and it's working out okay, right?). Usually no translation is better than embarrassingly bad translation.
