如何将复数单词变为单数?

时间:2021-09-13 10:16:35

I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now:

我正在为ORM准备一些表名,我想将多个表名转换为单个实体名。我唯一的问题是找到一个可靠的算法。这就是我现在正在做的事情:

  1. If a word ends with -ies, I replace the ending with -y
  2. 如果一个单词以-ies结尾,我用-y替换结尾

  3. If a word ends with -es, I remove this ending. This doesn't always work however - for example, it replaces Types with Typ
  4. 如果一个单词以-es结尾,我删除了这个结尾。但这并不总是有效 - 例如,它用Typ替换了类型

  5. Otherwise, I just remove the trailing -s
  6. 否则,我只是删除尾随-s

Does anyone know of a better algorithm?

有谁知道更好的算法?

13 个解决方案

#1


Those are all general rules (and good ones) but English is not a language for the faint of heart :-).

这些都是一般规则(和好的规则),但英语不是胆小的语言:-)。

My own preference would be to have a transformation engine along with a set of transformations (surprisingly enough) for doing the actual work.

我自己的偏好是拥有一个转换引擎以及一组转换(令人惊讶的足够)来完成实际工作。

You would run through the transformations (from specific to general) and, when a match was found, apply the transformation to the word.

您将完成转换(从特定到一般),并且在找到匹配时,将转换应用于单词。

Regular expressions would be an ideal approach to this due to their expressiveness. An example rule set:

正则表达式因其表现力而成为理想的表达方式。示例规则集:

 1. If the word is fish, return fish.
 2. If the word is sheep, return sheep.
 3. If the word is "radii", return "radius".
 4. If the word is "types", return "type".
 5. If the word ends in "ii", replace that "ii" with "us" (octopii,virii).
    : : : : :
97. If a word ends with -ies, I replace the ending with -y
98. If a word ends with -es, I remove this ending.
99. Otherwise, I just remove the trailing -s.

Note that an earlier version of the rules may not have had entry number 4. However, when we found the problem with "types" being transformed to "typ" at 98, we then created a higher-priority transformation at 4 to cater for this.

请注意,规则的早期版本可能没有条目号4.但是,当我们发现“类型”的问题在98处被转换为“typ”时,我们在4处创建了一个更高优先级的转换以满足此需求。

You'll basically need to keep this transformation table updated as you find all those wondrous exceptions that English has spawned.

当你发现英语产生的所有奇妙异常时,你基本上需要保持这个转换表的更新。


The other possibility is to not waste your time with a general rule. Since the names of the tables will be relatively limited, just create another table (or some sort of data structure) called singulars which maps all the relevant plural table names (employees) to singular object names (employee).

另一种可能性是不要浪费你的时间与一般规则。由于表的名称相对有限,只需创建另一个名为singulars的表(或某种数据结构),将所有相关的多个表名(雇员)映射到单个对象名(雇员)。

Then every time a table is added, add an entry to the singulars "table" so you can singularize it.

然后每次添加一个表格时,在单个“表格”中添加一个条目,以便您可以将其单一化。

#2


The problem is that's based on the general rules, but English has (figuratively) a billion exceptions... What do you do with words like "fish", or "geese"?

问题是基于一般规则,但英语(比喻)有十亿个例外......你用“鱼”或“鹅”这样的词怎么办?

Also, the rules are for how to turn singular nouns to plurals. The reverse mapping isn't necessarily possible (consider "freebies").

此外,规则是如何将单数名词变为复数。反向映射不一定是可能的(考虑“免费赠品”)。

#3


Andrew Peters has a class called Inflector.NET which provides plural-to-singular and singular-to-plural methods. As Tal has pointed out no algorithm is infallible but this covers a decent number of irregular English nouns.

Andrew Peters有一个名为Inflector.NET的类,它提供了复数到单数和单数到复数的方法。正如塔尔指出的那样,没有任何算法是绝对可靠的,但这涵盖了相当数量的不规则英语名词。

#4


Maybe take a look at source code of something like Rails Inflector

也许看看像Rails Inflector这样的源代码

#5


See also this answer, which recommends using Morpha (or studying the algorithm behind it).

另请参阅此答案,建议使用Morpha(或研究其背后的算法)。

If you know that the words that you want to lemmatize are plural nouns then you can tag them with NNS to get a more accurate output.

如果您知道要引用的词是复数名词,那么您可以使用NNS标记它们以获得更准确的输出。

Input example:

$ cat test.txt 
Types_NNS
Pies_NNS
Trees_NNS
Buses_NNS
Radii_NNS
Communities_NNS
Sheep_NNS
Fish_NNS

Output example:

$ cat test.txt | ./morpha -c
Type
Pie
Tree
Bus
Radius
Community
Sheep
Fish

#6


As an improvement, you could use rules that generate multiple possibilities and then look up the results in a dictionary to weed out impossible options.

作为改进,您可以使用生成多种可能性的规则,然后在字典中查找结果以清除不可能的选项。

For example replace -ies with -y and -ie. Pies becomes Py and Pie. Only one of those is in the dictionary, so choose that one.

例如,用-y和-ie替换-ies。派成为Py和Pie。其中只有一个在字典中,所以选择那个。

Perhaps you can even find a dictionary with frequency information and select the most common word you generate.

也许您甚至可以找到包含频率信息的字典,并选择您生成的最常用字词。

If you combine this with an ordered list of rules that covers a few exceptions, you might get pretty good accuracy.

如果将此与包含少数例外情况的有序规则列表结合使用,则可能会获得非常好的准确性。

#7


Maybe you need this,It works well ,if you know how to use PHP script.It can turn plural words to single words,and turn single words to plural words too.

也许你需要这个,它运作良好,如果你知道如何使用PHP脚本。它可以将多个单词转换为单个单词,并将单个单词转换为复数单词。

class BaseInflector
{
    /**
     * @var array the rules for converting a word into its plural form.
     * The keys are the regular expressions and the values are the corresponding replacements.
     */
    public static $plurals = [
        '/([nrlm]ese|deer|fish|sheep|measles|ois|pox|media)$/i' => '\1',
        '/^(sea[- ]bass)$/i' => '\1',
        '/(m)ove$/i' => '\1oves',
        '/(f)oot$/i' => '\1eet',
        '/(h)uman$/i' => '\1umans',
        '/(s)tatus$/i' => '\1tatuses',
        '/(s)taff$/i' => '\1taff',
        '/(t)ooth$/i' => '\1eeth',
        '/(quiz)$/i' => '\1zes',
        '/^(ox)$/i' => '\1\2en',
        '/([m|l])ouse$/i' => '\1ice',
        '/(matr|vert|ind)(ix|ex)$/i' => '\1ices',
        '/(x|ch|ss|sh)$/i' => '\1es',
        '/([^aeiouy]|qu)y$/i' => '\1ies',
        '/(hive)$/i' => '\1s',
        '/(?:([^f])fe|([lr])f)$/i' => '\1\2ves',
        '/sis$/i' => 'ses',
        '/([ti])um$/i' => '\1a',
        '/(p)erson$/i' => '\1eople',
        '/(m)an$/i' => '\1en',
        '/(c)hild$/i' => '\1hildren',
        '/(buffal|tomat|potat|ech|her|vet)o$/i' => '\1oes',
        '/(alumn|bacill|cact|foc|fung|nucle|radi|stimul|syllab|termin|vir)us$/i' => '\1i',
        '/us$/i' => 'uses',
        '/(alias)$/i' => '\1es',
        '/(ax|cris|test)is$/i' => '\1es',
        '/s$/' => 's',
        '/^$/' => '',
        '/$/' => 's',
    ];
    /**
     * @var array the rules for converting a word into its singular form.
     * The keys are the regular expressions and the values are the corresponding replacements.
     */
    public static $singulars = [
        '/([nrlm]ese|deer|fish|sheep|measles|ois|pox|media|ss)$/i' => '\1',
        '/^(sea[- ]bass)$/i' => '\1',
        '/(s)tatuses$/i' => '\1tatus',
        '/(f)eet$/i' => '\1oot',
        '/(t)eeth$/i' => '\1ooth',
        '/^(.*)(menu)s$/i' => '\1\2',
        '/(quiz)zes$/i' => '\\1',
        '/(matr)ices$/i' => '\1ix',
        '/(vert|ind)ices$/i' => '\1ex',
        '/^(ox)en/i' => '\1',
        '/(alias)(es)*$/i' => '\1',
        '/(alumn|bacill|cact|foc|fung|nucle|radi|stimul|syllab|termin|viri?)i$/i' => '\1us',
        '/([ftw]ax)es/i' => '\1',
        '/(cris|ax|test)es$/i' => '\1is',
        '/(shoe|slave)s$/i' => '\1',
        '/(o)es$/i' => '\1',
        '/ouses$/' => 'ouse',
        '/([^a])uses$/' => '\1us',
        '/([m|l])ice$/i' => '\1ouse',
        '/(x|ch|ss|sh)es$/i' => '\1',
        '/(m)ovies$/i' => '\1\2ovie',
        '/(s)eries$/i' => '\1\2eries',
        '/([^aeiouy]|qu)ies$/i' => '\1y',
        '/([lr])ves$/i' => '\1f',
        '/(tive)s$/i' => '\1',
        '/(hive)s$/i' => '\1',
        '/(drive)s$/i' => '\1',
        '/([^fo])ves$/i' => '\1fe',
        '/(^analy)ses$/i' => '\1sis',
        '/(analy|diagno|^ba|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i' => '\1\2sis',
        '/([ti])a$/i' => '\1um',
        '/(p)eople$/i' => '\1\2erson',
        '/(m)en$/i' => '\1an',
        '/(c)hildren$/i' => '\1\2hild',
        '/(n)ews$/i' => '\1\2ews',
        '/(n)etherlands$/i' => '\1\2etherlands',
        '/eaus$/' => 'eau',
        '/^(.*us)$/' => '\\1',
        '/s$/i' => '',
    ];
    /**
     * @var array the special rules for converting a word between its plural form and singular form.
     * The keys are the special words in singular form, and the values are the corresponding plural form.
     */
    public static $specials = [
        'atlas' => 'atlases',
        'beef' => 'beefs',
        'brother' => 'brothers',
        'cafe' => 'cafes',
        'child' => 'children',
        'cookie' => 'cookies',
        'corpus' => 'corpuses',
        'cow' => 'cows',
        'curve' => 'curves',
        'foe' => 'foes',
        'ganglion' => 'ganglions',
        'genie' => 'genies',
        'genus' => 'genera',
        'graffito' => 'graffiti',
        'hoof' => 'hoofs',
        'loaf' => 'loaves',
        'man' => 'men',
        'money' => 'monies',
        'mongoose' => 'mongooses',
        'move' => 'moves',
        'mythos' => 'mythoi',
        'niche' => 'niches',
        'numen' => 'numina',
        'occiput' => 'occiputs',
        'octopus' => 'octopuses',
        'opus' => 'opuses',
        'ox' => 'oxen',
        'penis' => 'penises',
        'sex' => 'sexes',
        'soliloquy' => 'soliloquies',
        'testis' => 'testes',
        'trilby' => 'trilbys',
        'turf' => 'turfs',
        'wave' => 'waves',
        'Amoyese' => 'Amoyese',
        'bison' => 'bison',
        'Borghese' => 'Borghese',
        'bream' => 'bream',
        'breeches' => 'breeches',
        'britches' => 'britches',
        'buffalo' => 'buffalo',
        'cantus' => 'cantus',
        'carp' => 'carp',
        'chassis' => 'chassis',
        'clippers' => 'clippers',
        'cod' => 'cod',
        'coitus' => 'coitus',
        'Congoese' => 'Congoese',
        'contretemps' => 'contretemps',
        'corps' => 'corps',
        'debris' => 'debris',
        'diabetes' => 'diabetes',
        'djinn' => 'djinn',
        'eland' => 'eland',
        'elk' => 'elk',
        'equipment' => 'equipment',
        'Faroese' => 'Faroese',
        'flounder' => 'flounder',
        'Foochowese' => 'Foochowese',
        'gallows' => 'gallows',
        'Genevese' => 'Genevese',
        'Genoese' => 'Genoese',
        'Gilbertese' => 'Gilbertese',
        'graffiti' => 'graffiti',
        'headquarters' => 'headquarters',
        'herpes' => 'herpes',
        'hijinks' => 'hijinks',
        'Hottentotese' => 'Hottentotese',
        'information' => 'information',
        'innings' => 'innings',
        'jackanapes' => 'jackanapes',
        'Kiplingese' => 'Kiplingese',
        'Kongoese' => 'Kongoese',
        'Lucchese' => 'Lucchese',
        'mackerel' => 'mackerel',
        'Maltese' => 'Maltese',
        'mews' => 'mews',
        'moose' => 'moose',
        'mumps' => 'mumps',
        'Nankingese' => 'Nankingese',
        'news' => 'news',
        'nexus' => 'nexus',
        'Niasese' => 'Niasese',
        'Pekingese' => 'Pekingese',
        'Piedmontese' => 'Piedmontese',
        'pincers' => 'pincers',
        'Pistoiese' => 'Pistoiese',
        'pliers' => 'pliers',
        'Portuguese' => 'Portuguese',
        'proceedings' => 'proceedings',
        'rabies' => 'rabies',
        'rice' => 'rice',
        'rhinoceros' => 'rhinoceros',
        'salmon' => 'salmon',
        'Sarawakese' => 'Sarawakese',
        'scissors' => 'scissors',
        'series' => 'series',
        'Shavese' => 'Shavese',
        'shears' => 'shears',
        'siemens' => 'siemens',
        'species' => 'species',
        'swine' => 'swine',
        'testes' => 'testes',
        'trousers' => 'trousers',
        'trout' => 'trout',
        'tuna' => 'tuna',
        'Vermontese' => 'Vermontese',
        'Wenchowese' => 'Wenchowese',
        'whiting' => 'whiting',
        'wildebeest' => 'wildebeest',
        'Yengeese' => 'Yengeese',
    ];
    /**
     * @var array fallback map for transliteration used by [[transliterate()]] when intl isn't available.
     */
    public static $transliteration = [
        'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A', 'Æ' => 'AE', 'Ç' => 'C',
        'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I',
        'Ð' => 'D', 'Ñ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ő' => 'O',
        'Ø' => 'O', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'Ű' => 'U', 'Ý' => 'Y', 'Þ' => 'TH',
        'ß' => 'ss',
        'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'å' => 'a', 'æ' => 'ae', 'ç' => 'c',
        'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i',
        'ð' => 'd', 'ñ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ő' => 'o',
        'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'ű' => 'u', 'ý' => 'y', 'þ' => 'th',
        'ÿ' => 'y',
    ];
    /**
     * Shortcut for `Any-Latin; NFKD` transliteration rule. The rule is strict, letters will be transliterated with
     * the closest sound-representation chars. The result may contain any UTF-8 chars. For example:
     * `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
     * `huò qǔ dào dochira Ukraí̈nsʹka: g̀,ê, Srpska: đ, n̂, d̂! ¿Español?`
     *
     * Used in [[transliterate()]].
     * For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
     * @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
     * @see transliterate()
     * @since 2.0.7
     */
    const TRANSLITERATE_STRICT = 'Any-Latin; NFKD';
    /**
     * Shortcut for `Any-Latin; Latin-ASCII` transliteration rule. The rule is medium, letters will be
     * transliterated to characters of Latin-1 (ISO 8859-1) ASCII table. For example:
     * `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
     * `huo qu dao dochira Ukrainsʹka: g,e, Srpska: d, n, d! ¿Espanol?`
     *
     * Used in [[transliterate()]].
     * For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
     * @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
     * @see transliterate()
     * @since 2.0.7
     */
    const TRANSLITERATE_MEDIUM = 'Any-Latin; Latin-ASCII';
    /**
     * Shortcut for `Any-Latin; Latin-ASCII; [\u0080-\uffff] remove` transliteration rule. The rule is loose,
     * letters will be transliterated with the characters of Basic Latin Unicode Block.
     * For example:
     * `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
     * `huo qu dao dochira Ukrainska: g,e, Srpska: d, n, d! Espanol?`
     *
     * Used in [[transliterate()]].
     * For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
     * @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
     * @see transliterate()
     * @since 2.0.7
     */
    const TRANSLITERATE_LOOSE = 'Any-Latin; Latin-ASCII; [\u0080-\uffff] remove';

    /**
     * @var mixed Either a [[\Transliterator]], or a string from which a [[\Transliterator]] can be built
     * for transliteration. Used by [[transliterate()]] when intl is available. Defaults to [[TRANSLITERATE_LOOSE]]
     * @see http://php.net/manual/en/transliterator.transliterate.php
     */
    public static $transliterator = self::TRANSLITERATE_LOOSE;


    /**
     * Converts a word to its plural form.
     * Note that this is for English only!
     * For example, 'apple' will become 'apples', and 'child' will become 'children'.
     * @param string $word the word to be pluralized
     * @return string the pluralized word
     */
    public static function pluralize($word)
    {
        if (isset(static::$specials[$word])) {
            return static::$specials[$word];
        }
        foreach (static::$plurals as $rule => $replacement) {
            if (preg_match($rule, $word)) {
                return preg_replace($rule, $replacement, $word);
            }
        }

        return $word;
    }

    /**
     * Returns the singular of the $word
     * @param string $word the english word to singularize
     * @return string Singular noun.
     */
    public static function singularize($word)
    {
        $result = array_search($word, static::$specials, true);
        if ($result !== false) {
            return $result;
        }
        foreach (static::$singulars as $rule => $replacement) {
            if (preg_match($rule, $word)) {
                return preg_replace($rule, $replacement, $word);
            }
        }

        return $word;
    }

    /**
     * Converts an underscored or CamelCase word into a English
     * sentence.
     * @param string $words
     * @param boolean $ucAll whether to set all words to uppercase
     * @return string
     */
    public static function titleize($words, $ucAll = false)
    {
        $words = static::humanize(static::underscore($words), $ucAll);

        return $ucAll ? ucwords($words) : ucfirst($words);
    }

    /**
     * Returns given word as CamelCased
     * Converts a word like "send_email" to "SendEmail". It
     * will remove non alphanumeric character from the word, so
     * "who's online" will be converted to "WhoSOnline"
     * @see variablize()
     * @param string $word the word to CamelCase
     * @return string
     */
    public static function camelize($word)
    {
        return str_replace(' ', '', ucwords(preg_replace('/[^A-Za-z0-9]+/', ' ', $word)));
    }

    /**
     * Converts a CamelCase name into space-separated words.
     * For example, 'PostTag' will be converted to 'Post Tag'.
     * @param string $name the string to be converted
     * @param boolean $ucwords whether to capitalize the first letter in each word
     * @return string the resulting words
     */
    public static function camel2words($name, $ucwords = true)
    {
        $label = trim(strtolower(str_replace([
            '-',
            '_',
            '.'
        ], ' ', preg_replace('/(?<![A-Z])[A-Z]/', ' \0', $name))));

        return $ucwords ? ucwords($label) : $label;
    }

    /**
     * Converts a CamelCase name into an ID in lowercase.
     * Words in the ID may be concatenated using the specified character (defaults to '-').
     * For example, 'PostTag' will be converted to 'post-tag'.
     * @param string $name the string to be converted
     * @param string $separator the character used to concatenate the words in the ID
     * @param boolean|string $strict whether to insert a separator between two consecutive uppercase chars, defaults to false
     * @return string the resulting ID
     */
    public static function camel2id($name, $separator = '-', $strict = false)
    {
        $regex = $strict ? '/[A-Z]/' : '/(?<![A-Z])[A-Z]/';
        if ($separator === '_') {
            return trim(strtolower(preg_replace($regex, '_\0', $name)), '_');
        } else {
            return trim(strtolower(str_replace('_', $separator, preg_replace($regex, $separator . '\0', $name))), $separator);
        }
    }

    /**
     * Converts an ID into a CamelCase name.
     * Words in the ID separated by `$separator` (defaults to '-') will be concatenated into a CamelCase name.
     * For example, 'post-tag' is converted to 'PostTag'.
     * @param string $id the ID to be converted
     * @param string $separator the character used to separate the words in the ID
     * @return string the resulting CamelCase name
     */
    public static function id2camel($id, $separator = '-')
    {
        return str_replace(' ', '', ucwords(implode(' ', explode($separator, $id))));
    }

    /**
     * Converts any "CamelCased" into an "underscored_word".
     * @param string $words the word(s) to underscore
     * @return string
     */
    public static function underscore($words)
    {
        return strtolower(preg_replace('/(?<=\\w)([A-Z])/', '_\\1', $words));
    }

    /**
     * Returns a human-readable string from $word
     * @param string $word the string to humanize
     * @param boolean $ucAll whether to set all words to uppercase or not
     * @return string
     */
    public static function humanize($word, $ucAll = false)
    {
        $word = str_replace('_', ' ', preg_replace('/_id$/', '', $word));

        return $ucAll ? ucwords($word) : ucfirst($word);
    }

    /**
     * Same as camelize but first char is in lowercase.
     * Converts a word like "send_email" to "sendEmail". It
     * will remove non alphanumeric character from the word, so
     * "who's online" will be converted to "whoSOnline"
     * @param string $word to lowerCamelCase
     * @return string
     */
    public static function variablize($word)
    {
        $word = static::camelize($word);

        return strtolower($word[0]) . substr($word, 1);
    }

    /**
     * Converts a class name to its table name (pluralized)
     * naming conventions. For example, converts "Person" to "people"
     * @param string $className the class name for getting related table_name
     * @return string
     */
    public static function tableize($className)
    {
        return static::pluralize(static::underscore($className));
    }

    /**
     * Returns a string with all spaces converted to given replacement,
     * non word characters removed and the rest of characters transliterated.
     *
     * If intl extension isn't available uses fallback that converts latin characters only
     * and removes the rest. You may customize characters map via $transliteration property
     * of the helper.
     *
     * @param string $string An arbitrary string to convert
     * @param string $replacement The replacement to use for spaces
     * @param boolean $lowercase whether to return the string in lowercase or not. Defaults to `true`.
     * @return string The converted string.
     */
    public static function slug($string, $replacement = '-', $lowercase = true)
    {
        $string = static::transliterate($string);
        $string = preg_replace('/[^a-zA-Z0-9=\s—–-]+/u', '', $string);
        $string = preg_replace('/[=\s—–-]+/u', $replacement, $string);
        $string = trim($string, $replacement);

        return $lowercase ? strtolower($string) : $string;
    }

    /**
     * Returns transliterated version of a string.
     *
     * If intl extension isn't available uses fallback that converts latin characters only
     * and removes the rest. You may customize characters map via $transliteration property
     * of the helper.
     *
     * @param string $string input string
     * @param string|\Transliterator $transliterator either a [[Transliterator]] or a string
     * from which a [[Transliterator]] can be built.
     * @return string
     * @since 2.0.7 this method is public.
     */
    public static function transliterate($string, $transliterator = null)
    {
        if (static::hasIntl()) {
            if ($transliterator === null) {
                $transliterator = static::$transliterator;
            }

            return transliterator_transliterate($transliterator, $string);
        } else {
            return strtr($string, static::$transliteration);
        }
    }

    /**
     * @return boolean if intl extension is loaded
     */
    protected static function hasIntl()
    {
        return extension_loaded('intl');
    }

    /**
     * Converts a table name to its class name. For example, converts "people" to "Person"
     * @param string $tableName
     * @return string
     */
    public static function classify($tableName)
    {
        return static::camelize(static::singularize($tableName));
    }

    /**
     * Converts number to its ordinal English form. For example, converts 13 to 13th, 2 to 2nd ...
     * @param integer $number the number to get its ordinal value
     * @return string
     */
    public static function ordinalize($number)
    {
        if (in_array($number % 100, range(11, 13))) {
            return $number . 'th';
        }
        switch ($number % 10) {
            case 1:
                return $number . 'st';
            case 2:
                return $number . 'nd';
            case 3:
                return $number . 'rd';
            default:
                return $number . 'th';
        }
    }

    /**
     * Converts a list of words into a sentence.
     *
     * Special treatment is done for the last few words. For example,
     *
     * ```php
     * $words = ['Spain', 'France'];
     * echo Inflector::sentence($words);
     * // output: Spain and France
     *
     * $words = ['Spain', 'France', 'Italy'];
     * echo Inflector::sentence($words);
     * // output: Spain, France and Italy
     *
     * $words = ['Spain', 'France', 'Italy'];
     * echo Inflector::sentence($words, ' & ');
     * // output: Spain, France & Italy
     * ```
     *
     * @param array $words the words to be converted into an string
     * @param string $twoWordsConnector the string connecting words when there are only two
     * @param string $lastWordConnector the string connecting the last two words. If this is null, it will
     * take the value of `$twoWordsConnector`.
     * @param string $connector the string connecting words other than those connected by
     * $lastWordConnector and $twoWordsConnector
     * @return string the generated sentence
     * @since 2.0.1
     */
    public static function sentence(array $words, $twoWordsConnector = ' and ', $lastWordConnector = null, $connector = ', ')
    {
        if ($lastWordConnector === null) {
            $lastWordConnector = $twoWordsConnector;
        }
        switch (count($words)) {
            case 0:
                return '';
            case 1:
                return reset($words);
            case 2:
                return implode($twoWordsConnector, $words);
            default:
                return implode($connector, array_slice($words, 0, -1)) . $lastWordConnector . end($words);
        }
    }
}

There is some example.

有一些例子。

echo "Inflector Test";
require('PhInflector.php');
echo "<hr>";
echo PhInflector::slug('Höäpeäöäich Médsui27:;;,.1! *"29p');
echo "<hr>";
echo PhInflector::slug('HIJO"$(/&T §!"(/&T"§:;;,.1! *"29p');
echo "<hr>";
echo PhInflector::slug('38917 jiodj d                         ! *"29p');
echo "<hr>";
echo PhInflector::slug('каи циефле ///!!!');

And forward github link click here.

转发github链接点击这里。

#8


I'm sure you can google to find plenty of libs that do this.

我相信你可以谷歌找到很多这样做的库。

But if you feel like coding, you could try the reverse process: start with singular words of dictionary (download free ones, used by aspell or whatever), use pluralization rule; collect mappings and switch the direction. For "type" you would pluralize to "types", and reverse mapping would work as expected. While there are exceptions here too it is slightly easier to reliably pluralize things. I did this a while back (in mid 90s... :-) ), for an online game (a MUD), where descriptions for multiple identical items were concatenatd, and automatic pluralization was needed.

但是如果你想编码,你可以尝试相反的过程:从字典的单数词开始(下载免费的,由aspell或其他使用),使用复数规则;收集映射并切换方向。对于“类型”,您将复数形式为“类型”,反向映射将按预期工作。虽然这里也有例外,但是可靠地使事物多元化更容易一些。我做了一段时间(90年代中期...... :-)),对于在线游戏(MUD),其中多个相同项目的描述被连接,并且需要自动复数。

Also: given that it's finite number of tables you could just use simplest algorithm, get raw output, eyeball it and fix error cases manually. :-)

另外:鉴于它是有限数量的表,您可以使用最简单的算法,获取原始输出,眼球并手动修复错误情况。 :-)

#9


I think you have to use a list to translate plural into singular for some special words (in your example Types->Type).

我认为您必须使用列表将复数转换为单数形式的某些特殊单词(在您的示例中类型 - >类型)。

I think you could have a look at the sourcecode of CakePHP (you might start your search here). They are using such an algorithm for their tablenames and fieldnames to automagically join tables.

我想你可以看一下CakePHP的源代码(你可以在这里开始搜索)。他们使用这样的算法为他们的表名和字段名自动连接表。


[Edit:] Here you have some scientific work to read about "Plural inflection in English"

[编辑:]在这里你有一些科学的工作可以阅读“英语中的多次变形”

#10


I'm going to try this MorphAdorner: http://morphadorner.northwestern.edu/morphadorner/download/ (Java). It's a collection of different types of NLP processing tools, and you can test them through online examples. For your problem (that is also my problem) there's the Pluralizer tool: http://morphadorner.northwestern.edu/morphadorner/pluralizer/example/

我将尝试使用MorphAdorner:http://morphadorner.northwestern.edu/morphadorner/download/(Java)。它是不同类型的NLP处理工具的集合,您可以通过在线示例对它们进行测试。对于你的问题(这也是我的问题),有Pluralizer工具:http://morphadorner.northwestern.edu/morphadorner/pluralizer/example/

#11


Consider the python package "inflect"

考虑一下python包“inflect”

"Correctly generate plurals, singular nouns, ordinals, indefinite articles; convert numbers to words"

“正确地生成复数,单数名词,序数,不定冠词;将数字转换为单词”

https://pypi.python.org/pypi/inflect

#12


I just encounter this problem and developed a solution in 10 mins.

我刚刚遇到这个问题,并在10分钟内开发出一个解决方案。

I think @paxdiablo provides a good thought on building a transforming engine and add rules. I build one dictionary rule and three common rules. The dictionary rule goes to a dict file to lookup exception cases, while the three common rules handle "ies", "es" and "s" respectively.

我认为@paxdiablo在构建转换引擎和添加规则方面提供了很好的思考。我构建了一个字典规则和三个常用规则。字典规则转到dict文件以查找异常情况,而三个通用规则分别处理“ies”,“es”和“s”。

However, it may takes too much time to add all exceptions to the dictionary, e.g., pies/trees/bus etc. One improvement I have made to deal with these words is to make sure it can be converted back.

但是,将所有异常添加到字典中可能需要太多时间,例如,馅饼/树/公共汽车等。我处理这些单词的一个改进是确保它可以被转换回来。

E.g., if we incorrectly apply the remove "es" rule to "trees" and convert it to "tre", when trying to add plural form back, you will get "tres", which doesn't equal to the original "tree" and you know the "es" rule should not be applied. This method can solve exceptions mentioned above without adding them to a dictionary file.

例如,如果我们错误地将删除“es”规则应用于“树”并将其转换为“tre”,则在尝试添加复数形式时,您将获得“tres”,这不等于原始“树”而且你知道不应该应用“es”规则。此方法可以解决上述异常,而无需将其添加到字典文件中。

I end up with a dictionary file of 42 truly exceptional words and it could handle most of the cases.

我最终得到了42个真正特殊单词的字典文件,它可以处理大多数情况。

#13


There's a nice implementation of an inflector in uNnAddIns project that even implements an experimental spanish inflector. The idea is caught from Rails Inflector module.

在uNnAddIns项目中有一个很好的变形器实现,甚至可以实现一个实验性的西班牙变形器。这个想法来自Rails Inflector模块。

It can be used as well for other things like converting from CamelCase to normal text and other goodies and for example generating browser friendly URLs from titles.

它也可以用于其他事情,例如从CamelCase转换为普通文本和其他好东西,例如从标题生成浏览器友好的URL。

#1


Those are all general rules (and good ones) but English is not a language for the faint of heart :-).

这些都是一般规则(和好的规则),但英语不是胆小的语言:-)。

My own preference would be to have a transformation engine along with a set of transformations (surprisingly enough) for doing the actual work.

我自己的偏好是拥有一个转换引擎以及一组转换(令人惊讶的足够)来完成实际工作。

You would run through the transformations (from specific to general) and, when a match was found, apply the transformation to the word.

您将完成转换(从特定到一般),并且在找到匹配时,将转换应用于单词。

Regular expressions would be an ideal approach to this due to their expressiveness. An example rule set:

正则表达式因其表现力而成为理想的表达方式。示例规则集:

 1. If the word is fish, return fish.
 2. If the word is sheep, return sheep.
 3. If the word is "radii", return "radius".
 4. If the word is "types", return "type".
 5. If the word ends in "ii", replace that "ii" with "us" (octopii,virii).
    : : : : :
97. If a word ends with -ies, I replace the ending with -y
98. If a word ends with -es, I remove this ending.
99. Otherwise, I just remove the trailing -s.

Note that an earlier version of the rules may not have had entry number 4. However, when we found the problem with "types" being transformed to "typ" at 98, we then created a higher-priority transformation at 4 to cater for this.

请注意,规则的早期版本可能没有条目号4.但是,当我们发现“类型”的问题在98处被转换为“typ”时,我们在4处创建了一个更高优先级的转换以满足此需求。

You'll basically need to keep this transformation table updated as you find all those wondrous exceptions that English has spawned.

当你发现英语产生的所有奇妙异常时,你基本上需要保持这个转换表的更新。


The other possibility is to not waste your time with a general rule. Since the names of the tables will be relatively limited, just create another table (or some sort of data structure) called singulars which maps all the relevant plural table names (employees) to singular object names (employee).

另一种可能性是不要浪费你的时间与一般规则。由于表的名称相对有限,只需创建另一个名为singulars的表(或某种数据结构),将所有相关的多个表名(雇员)映射到单个对象名(雇员)。

Then every time a table is added, add an entry to the singulars "table" so you can singularize it.

然后每次添加一个表格时,在单个“表格”中添加一个条目,以便您可以将其单一化。

#2


The problem is that's based on the general rules, but English has (figuratively) a billion exceptions... What do you do with words like "fish", or "geese"?

问题是基于一般规则,但英语(比喻)有十亿个例外......你用“鱼”或“鹅”这样的词怎么办?

Also, the rules are for how to turn singular nouns to plurals. The reverse mapping isn't necessarily possible (consider "freebies").

此外,规则是如何将单数名词变为复数。反向映射不一定是可能的(考虑“免费赠品”)。

#3


Andrew Peters has a class called Inflector.NET which provides plural-to-singular and singular-to-plural methods. As Tal has pointed out no algorithm is infallible but this covers a decent number of irregular English nouns.

Andrew Peters有一个名为Inflector.NET的类,它提供了复数到单数和单数到复数的方法。正如塔尔指出的那样,没有任何算法是绝对可靠的,但这涵盖了相当数量的不规则英语名词。

#4


Maybe take a look at source code of something like Rails Inflector

也许看看像Rails Inflector这样的源代码

#5


See also this answer, which recommends using Morpha (or studying the algorithm behind it).

另请参阅此答案,建议使用Morpha(或研究其背后的算法)。

If you know that the words that you want to lemmatize are plural nouns then you can tag them with NNS to get a more accurate output.

如果您知道要引用的词是复数名词,那么您可以使用NNS标记它们以获得更准确的输出。

Input example:

$ cat test.txt 
Types_NNS
Pies_NNS
Trees_NNS
Buses_NNS
Radii_NNS
Communities_NNS
Sheep_NNS
Fish_NNS

Output example:

$ cat test.txt | ./morpha -c
Type
Pie
Tree
Bus
Radius
Community
Sheep
Fish

#6


As an improvement, you could use rules that generate multiple possibilities and then look up the results in a dictionary to weed out impossible options.

作为改进,您可以使用生成多种可能性的规则,然后在字典中查找结果以清除不可能的选项。

For example replace -ies with -y and -ie. Pies becomes Py and Pie. Only one of those is in the dictionary, so choose that one.

例如,用-y和-ie替换-ies。派成为Py和Pie。其中只有一个在字典中,所以选择那个。

Perhaps you can even find a dictionary with frequency information and select the most common word you generate.

也许您甚至可以找到包含频率信息的字典,并选择您生成的最常用字词。

If you combine this with an ordered list of rules that covers a few exceptions, you might get pretty good accuracy.

如果将此与包含少数例外情况的有序规则列表结合使用,则可能会获得非常好的准确性。

#7


Maybe you need this,It works well ,if you know how to use PHP script.It can turn plural words to single words,and turn single words to plural words too.

也许你需要这个,它运作良好,如果你知道如何使用PHP脚本。它可以将多个单词转换为单个单词,并将单个单词转换为复数单词。

class BaseInflector
{
    /**
     * @var array the rules for converting a word into its plural form.
     * The keys are the regular expressions and the values are the corresponding replacements.
     */
    public static $plurals = [
        '/([nrlm]ese|deer|fish|sheep|measles|ois|pox|media)$/i' => '\1',
        '/^(sea[- ]bass)$/i' => '\1',
        '/(m)ove$/i' => '\1oves',
        '/(f)oot$/i' => '\1eet',
        '/(h)uman$/i' => '\1umans',
        '/(s)tatus$/i' => '\1tatuses',
        '/(s)taff$/i' => '\1taff',
        '/(t)ooth$/i' => '\1eeth',
        '/(quiz)$/i' => '\1zes',
        '/^(ox)$/i' => '\1\2en',
        '/([m|l])ouse$/i' => '\1ice',
        '/(matr|vert|ind)(ix|ex)$/i' => '\1ices',
        '/(x|ch|ss|sh)$/i' => '\1es',
        '/([^aeiouy]|qu)y$/i' => '\1ies',
        '/(hive)$/i' => '\1s',
        '/(?:([^f])fe|([lr])f)$/i' => '\1\2ves',
        '/sis$/i' => 'ses',
        '/([ti])um$/i' => '\1a',
        '/(p)erson$/i' => '\1eople',
        '/(m)an$/i' => '\1en',
        '/(c)hild$/i' => '\1hildren',
        '/(buffal|tomat|potat|ech|her|vet)o$/i' => '\1oes',
        '/(alumn|bacill|cact|foc|fung|nucle|radi|stimul|syllab|termin|vir)us$/i' => '\1i',
        '/us$/i' => 'uses',
        '/(alias)$/i' => '\1es',
        '/(ax|cris|test)is$/i' => '\1es',
        '/s$/' => 's',
        '/^$/' => '',
        '/$/' => 's',
    ];
    /**
     * @var array the rules for converting a word into its singular form.
     * The keys are the regular expressions and the values are the corresponding replacements.
     */
    public static $singulars = [
        '/([nrlm]ese|deer|fish|sheep|measles|ois|pox|media|ss)$/i' => '\1',
        '/^(sea[- ]bass)$/i' => '\1',
        '/(s)tatuses$/i' => '\1tatus',
        '/(f)eet$/i' => '\1oot',
        '/(t)eeth$/i' => '\1ooth',
        '/^(.*)(menu)s$/i' => '\1\2',
        '/(quiz)zes$/i' => '\\1',
        '/(matr)ices$/i' => '\1ix',
        '/(vert|ind)ices$/i' => '\1ex',
        '/^(ox)en/i' => '\1',
        '/(alias)(es)*$/i' => '\1',
        '/(alumn|bacill|cact|foc|fung|nucle|radi|stimul|syllab|termin|viri?)i$/i' => '\1us',
        '/([ftw]ax)es/i' => '\1',
        '/(cris|ax|test)es$/i' => '\1is',
        '/(shoe|slave)s$/i' => '\1',
        '/(o)es$/i' => '\1',
        '/ouses$/' => 'ouse',
        '/([^a])uses$/' => '\1us',
        '/([m|l])ice$/i' => '\1ouse',
        '/(x|ch|ss|sh)es$/i' => '\1',
        '/(m)ovies$/i' => '\1\2ovie',
        '/(s)eries$/i' => '\1\2eries',
        '/([^aeiouy]|qu)ies$/i' => '\1y',
        '/([lr])ves$/i' => '\1f',
        '/(tive)s$/i' => '\1',
        '/(hive)s$/i' => '\1',
        '/(drive)s$/i' => '\1',
        '/([^fo])ves$/i' => '\1fe',
        '/(^analy)ses$/i' => '\1sis',
        '/(analy|diagno|^ba|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i' => '\1\2sis',
        '/([ti])a$/i' => '\1um',
        '/(p)eople$/i' => '\1\2erson',
        '/(m)en$/i' => '\1an',
        '/(c)hildren$/i' => '\1\2hild',
        '/(n)ews$/i' => '\1\2ews',
        '/(n)etherlands$/i' => '\1\2etherlands',
        '/eaus$/' => 'eau',
        '/^(.*us)$/' => '\\1',
        '/s$/i' => '',
    ];
    /**
     * @var array the special rules for converting a word between its plural form and singular form.
     * The keys are the special words in singular form, and the values are the corresponding plural form.
     */
    public static $specials = [
        'atlas' => 'atlases',
        'beef' => 'beefs',
        'brother' => 'brothers',
        'cafe' => 'cafes',
        'child' => 'children',
        'cookie' => 'cookies',
        'corpus' => 'corpuses',
        'cow' => 'cows',
        'curve' => 'curves',
        'foe' => 'foes',
        'ganglion' => 'ganglions',
        'genie' => 'genies',
        'genus' => 'genera',
        'graffito' => 'graffiti',
        'hoof' => 'hoofs',
        'loaf' => 'loaves',
        'man' => 'men',
        'money' => 'monies',
        'mongoose' => 'mongooses',
        'move' => 'moves',
        'mythos' => 'mythoi',
        'niche' => 'niches',
        'numen' => 'numina',
        'occiput' => 'occiputs',
        'octopus' => 'octopuses',
        'opus' => 'opuses',
        'ox' => 'oxen',
        'penis' => 'penises',
        'sex' => 'sexes',
        'soliloquy' => 'soliloquies',
        'testis' => 'testes',
        'trilby' => 'trilbys',
        'turf' => 'turfs',
        'wave' => 'waves',
        'Amoyese' => 'Amoyese',
        'bison' => 'bison',
        'Borghese' => 'Borghese',
        'bream' => 'bream',
        'breeches' => 'breeches',
        'britches' => 'britches',
        'buffalo' => 'buffalo',
        'cantus' => 'cantus',
        'carp' => 'carp',
        'chassis' => 'chassis',
        'clippers' => 'clippers',
        'cod' => 'cod',
        'coitus' => 'coitus',
        'Congoese' => 'Congoese',
        'contretemps' => 'contretemps',
        'corps' => 'corps',
        'debris' => 'debris',
        'diabetes' => 'diabetes',
        'djinn' => 'djinn',
        'eland' => 'eland',
        'elk' => 'elk',
        'equipment' => 'equipment',
        'Faroese' => 'Faroese',
        'flounder' => 'flounder',
        'Foochowese' => 'Foochowese',
        'gallows' => 'gallows',
        'Genevese' => 'Genevese',
        'Genoese' => 'Genoese',
        'Gilbertese' => 'Gilbertese',
        'graffiti' => 'graffiti',
        'headquarters' => 'headquarters',
        'herpes' => 'herpes',
        'hijinks' => 'hijinks',
        'Hottentotese' => 'Hottentotese',
        'information' => 'information',
        'innings' => 'innings',
        'jackanapes' => 'jackanapes',
        'Kiplingese' => 'Kiplingese',
        'Kongoese' => 'Kongoese',
        'Lucchese' => 'Lucchese',
        'mackerel' => 'mackerel',
        'Maltese' => 'Maltese',
        'mews' => 'mews',
        'moose' => 'moose',
        'mumps' => 'mumps',
        'Nankingese' => 'Nankingese',
        'news' => 'news',
        'nexus' => 'nexus',
        'Niasese' => 'Niasese',
        'Pekingese' => 'Pekingese',
        'Piedmontese' => 'Piedmontese',
        'pincers' => 'pincers',
        'Pistoiese' => 'Pistoiese',
        'pliers' => 'pliers',
        'Portuguese' => 'Portuguese',
        'proceedings' => 'proceedings',
        'rabies' => 'rabies',
        'rice' => 'rice',
        'rhinoceros' => 'rhinoceros',
        'salmon' => 'salmon',
        'Sarawakese' => 'Sarawakese',
        'scissors' => 'scissors',
        'series' => 'series',
        'Shavese' => 'Shavese',
        'shears' => 'shears',
        'siemens' => 'siemens',
        'species' => 'species',
        'swine' => 'swine',
        'testes' => 'testes',
        'trousers' => 'trousers',
        'trout' => 'trout',
        'tuna' => 'tuna',
        'Vermontese' => 'Vermontese',
        'Wenchowese' => 'Wenchowese',
        'whiting' => 'whiting',
        'wildebeest' => 'wildebeest',
        'Yengeese' => 'Yengeese',
    ];
    /**
     * @var array fallback map for transliteration used by [[transliterate()]] when intl isn't available.
     */
    public static $transliteration = [
        'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A', 'Æ' => 'AE', 'Ç' => 'C',
        'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I',
        'Ð' => 'D', 'Ñ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ő' => 'O',
        'Ø' => 'O', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'Ű' => 'U', 'Ý' => 'Y', 'Þ' => 'TH',
        'ß' => 'ss',
        'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'å' => 'a', 'æ' => 'ae', 'ç' => 'c',
        'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i',
        'ð' => 'd', 'ñ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ő' => 'o',
        'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'ű' => 'u', 'ý' => 'y', 'þ' => 'th',
        'ÿ' => 'y',
    ];
    /**
     * Shortcut for `Any-Latin; NFKD` transliteration rule. The rule is strict, letters will be transliterated with
     * the closest sound-representation chars. The result may contain any UTF-8 chars. For example:
     * `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
     * `huò qǔ dào dochira Ukraí̈nsʹka: g̀,ê, Srpska: đ, n̂, d̂! ¿Español?`
     *
     * Used in [[transliterate()]].
     * For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
     * @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
     * @see transliterate()
     * @since 2.0.7
     */
    const TRANSLITERATE_STRICT = 'Any-Latin; NFKD';
    /**
     * Shortcut for `Any-Latin; Latin-ASCII` transliteration rule. The rule is medium, letters will be
     * transliterated to characters of Latin-1 (ISO 8859-1) ASCII table. For example:
     * `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
     * `huo qu dao dochira Ukrainsʹka: g,e, Srpska: d, n, d! ¿Espanol?`
     *
     * Used in [[transliterate()]].
     * For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
     * @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
     * @see transliterate()
     * @since 2.0.7
     */
    const TRANSLITERATE_MEDIUM = 'Any-Latin; Latin-ASCII';
    /**
     * Shortcut for `Any-Latin; Latin-ASCII; [\u0080-\uffff] remove` transliteration rule. The rule is loose,
     * letters will be transliterated with the characters of Basic Latin Unicode Block.
     * For example:
     * `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
     * `huo qu dao dochira Ukrainska: g,e, Srpska: d, n, d! Espanol?`
     *
     * Used in [[transliterate()]].
     * For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
     * @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
     * @see transliterate()
     * @since 2.0.7
     */
    const TRANSLITERATE_LOOSE = 'Any-Latin; Latin-ASCII; [\u0080-\uffff] remove';

    /**
     * @var mixed Either a [[\Transliterator]], or a string from which a [[\Transliterator]] can be built
     * for transliteration. Used by [[transliterate()]] when intl is available. Defaults to [[TRANSLITERATE_LOOSE]]
     * @see http://php.net/manual/en/transliterator.transliterate.php
     */
    public static $transliterator = self::TRANSLITERATE_LOOSE;


    /**
     * Converts a word to its plural form.
     * Note that this is for English only!
     * For example, 'apple' will become 'apples', and 'child' will become 'children'.
     * @param string $word the word to be pluralized
     * @return string the pluralized word
     */
    public static function pluralize($word)
    {
        if (isset(static::$specials[$word])) {
            return static::$specials[$word];
        }
        foreach (static::$plurals as $rule => $replacement) {
            if (preg_match($rule, $word)) {
                return preg_replace($rule, $replacement, $word);
            }
        }

        return $word;
    }

    /**
     * Returns the singular of the $word
     * @param string $word the english word to singularize
     * @return string Singular noun.
     */
    public static function singularize($word)
    {
        $result = array_search($word, static::$specials, true);
        if ($result !== false) {
            return $result;
        }
        foreach (static::$singulars as $rule => $replacement) {
            if (preg_match($rule, $word)) {
                return preg_replace($rule, $replacement, $word);
            }
        }

        return $word;
    }

    /**
     * Converts an underscored or CamelCase word into a English
     * sentence.
     * @param string $words
     * @param boolean $ucAll whether to set all words to uppercase
     * @return string
     */
    public static function titleize($words, $ucAll = false)
    {
        $words = static::humanize(static::underscore($words), $ucAll);

        return $ucAll ? ucwords($words) : ucfirst($words);
    }

    /**
     * Returns given word as CamelCased
     * Converts a word like "send_email" to "SendEmail". It
     * will remove non alphanumeric character from the word, so
     * "who's online" will be converted to "WhoSOnline"
     * @see variablize()
     * @param string $word the word to CamelCase
     * @return string
     */
    public static function camelize($word)
    {
        return str_replace(' ', '', ucwords(preg_replace('/[^A-Za-z0-9]+/', ' ', $word)));
    }

    /**
     * Converts a CamelCase name into space-separated words.
     * For example, 'PostTag' will be converted to 'Post Tag'.
     * @param string $name the string to be converted
     * @param boolean $ucwords whether to capitalize the first letter in each word
     * @return string the resulting words
     */
    public static function camel2words($name, $ucwords = true)
    {
        $label = trim(strtolower(str_replace([
            '-',
            '_',
            '.'
        ], ' ', preg_replace('/(?<![A-Z])[A-Z]/', ' \0', $name))));

        return $ucwords ? ucwords($label) : $label;
    }

    /**
     * Converts a CamelCase name into an ID in lowercase.
     * Words in the ID may be concatenated using the specified character (defaults to '-').
     * For example, 'PostTag' will be converted to 'post-tag'.
     * @param string $name the string to be converted
     * @param string $separator the character used to concatenate the words in the ID
     * @param boolean|string $strict whether to insert a separator between two consecutive uppercase chars, defaults to false
     * @return string the resulting ID
     */
    public static function camel2id($name, $separator = '-', $strict = false)
    {
        $regex = $strict ? '/[A-Z]/' : '/(?<![A-Z])[A-Z]/';
        if ($separator === '_') {
            return trim(strtolower(preg_replace($regex, '_\0', $name)), '_');
        } else {
            return trim(strtolower(str_replace('_', $separator, preg_replace($regex, $separator . '\0', $name))), $separator);
        }
    }

    /**
     * Converts an ID into a CamelCase name.
     * Words in the ID separated by `$separator` (defaults to '-') will be concatenated into a CamelCase name.
     * For example, 'post-tag' is converted to 'PostTag'.
     * @param string $id the ID to be converted
     * @param string $separator the character used to separate the words in the ID
     * @return string the resulting CamelCase name
     */
    public static function id2camel($id, $separator = '-')
    {
        return str_replace(' ', '', ucwords(implode(' ', explode($separator, $id))));
    }

    /**
     * Converts any "CamelCased" into an "underscored_word".
     * @param string $words the word(s) to underscore
     * @return string
     */
    public static function underscore($words)
    {
        return strtolower(preg_replace('/(?<=\\w)([A-Z])/', '_\\1', $words));
    }

    /**
     * Returns a human-readable string from $word
     * @param string $word the string to humanize
     * @param boolean $ucAll whether to set all words to uppercase or not
     * @return string
     */
    public static function humanize($word, $ucAll = false)
    {
        $word = str_replace('_', ' ', preg_replace('/_id$/', '', $word));

        return $ucAll ? ucwords($word) : ucfirst($word);
    }

    /**
     * Same as camelize but first char is in lowercase.
     * Converts a word like "send_email" to "sendEmail". It
     * will remove non alphanumeric character from the word, so
     * "who's online" will be converted to "whoSOnline"
     * @param string $word to lowerCamelCase
     * @return string
     */
    public static function variablize($word)
    {
        $word = static::camelize($word);

        return strtolower($word[0]) . substr($word, 1);
    }

    /**
     * Converts a class name to its table name (pluralized)
     * naming conventions. For example, converts "Person" to "people"
     * @param string $className the class name for getting related table_name
     * @return string
     */
    public static function tableize($className)
    {
        return static::pluralize(static::underscore($className));
    }

    /**
     * Returns a string with all spaces converted to given replacement,
     * non word characters removed and the rest of characters transliterated.
     *
     * If intl extension isn't available uses fallback that converts latin characters only
     * and removes the rest. You may customize characters map via $transliteration property
     * of the helper.
     *
     * @param string $string An arbitrary string to convert
     * @param string $replacement The replacement to use for spaces
     * @param boolean $lowercase whether to return the string in lowercase or not. Defaults to `true`.
     * @return string The converted string.
     */
    public static function slug($string, $replacement = '-', $lowercase = true)
    {
        $string = static::transliterate($string);
        $string = preg_replace('/[^a-zA-Z0-9=\s—–-]+/u', '', $string);
        $string = preg_replace('/[=\s—–-]+/u', $replacement, $string);
        $string = trim($string, $replacement);

        return $lowercase ? strtolower($string) : $string;
    }

    /**
     * Returns transliterated version of a string.
     *
     * If intl extension isn't available uses fallback that converts latin characters only
     * and removes the rest. You may customize characters map via $transliteration property
     * of the helper.
     *
     * @param string $string input string
     * @param string|\Transliterator $transliterator either a [[Transliterator]] or a string
     * from which a [[Transliterator]] can be built.
     * @return string
     * @since 2.0.7 this method is public.
     */
    public static function transliterate($string, $transliterator = null)
    {
        if (static::hasIntl()) {
            if ($transliterator === null) {
                $transliterator = static::$transliterator;
            }

            return transliterator_transliterate($transliterator, $string);
        } else {
            return strtr($string, static::$transliteration);
        }
    }

    /**
     * @return boolean if intl extension is loaded
     */
    protected static function hasIntl()
    {
        return extension_loaded('intl');
    }

    /**
     * Converts a table name to its class name. For example, converts "people" to "Person"
     * @param string $tableName
     * @return string
     */
    public static function classify($tableName)
    {
        return static::camelize(static::singularize($tableName));
    }

    /**
     * Converts number to its ordinal English form. For example, converts 13 to 13th, 2 to 2nd ...
     * @param integer $number the number to get its ordinal value
     * @return string
     */
    public static function ordinalize($number)
    {
        if (in_array($number % 100, range(11, 13))) {
            return $number . 'th';
        }
        switch ($number % 10) {
            case 1:
                return $number . 'st';
            case 2:
                return $number . 'nd';
            case 3:
                return $number . 'rd';
            default:
                return $number . 'th';
        }
    }

    /**
     * Converts a list of words into a sentence.
     *
     * Special treatment is done for the last few words. For example,
     *
     * ```php
     * $words = ['Spain', 'France'];
     * echo Inflector::sentence($words);
     * // output: Spain and France
     *
     * $words = ['Spain', 'France', 'Italy'];
     * echo Inflector::sentence($words);
     * // output: Spain, France and Italy
     *
     * $words = ['Spain', 'France', 'Italy'];
     * echo Inflector::sentence($words, ' & ');
     * // output: Spain, France & Italy
     * ```
     *
     * @param array $words the words to be converted into an string
     * @param string $twoWordsConnector the string connecting words when there are only two
     * @param string $lastWordConnector the string connecting the last two words. If this is null, it will
     * take the value of `$twoWordsConnector`.
     * @param string $connector the string connecting words other than those connected by
     * $lastWordConnector and $twoWordsConnector
     * @return string the generated sentence
     * @since 2.0.1
     */
    public static function sentence(array $words, $twoWordsConnector = ' and ', $lastWordConnector = null, $connector = ', ')
    {
        if ($lastWordConnector === null) {
            $lastWordConnector = $twoWordsConnector;
        }
        switch (count($words)) {
            case 0:
                return '';
            case 1:
                return reset($words);
            case 2:
                return implode($twoWordsConnector, $words);
            default:
                return implode($connector, array_slice($words, 0, -1)) . $lastWordConnector . end($words);
        }
    }
}

There is some example.

有一些例子。

echo "Inflector Test";
require('PhInflector.php');
echo "<hr>";
echo PhInflector::slug('Höäpeäöäich Médsui27:;;,.1! *"29p');
echo "<hr>";
echo PhInflector::slug('HIJO"$(/&T §!"(/&T"§:;;,.1! *"29p');
echo "<hr>";
echo PhInflector::slug('38917 jiodj d                         ! *"29p');
echo "<hr>";
echo PhInflector::slug('каи циефле ///!!!');

And forward github link click here.

转发github链接点击这里。

#8


I'm sure you can google to find plenty of libs that do this.

我相信你可以谷歌找到很多这样做的库。

But if you feel like coding, you could try the reverse process: start with singular words of dictionary (download free ones, used by aspell or whatever), use pluralization rule; collect mappings and switch the direction. For "type" you would pluralize to "types", and reverse mapping would work as expected. While there are exceptions here too it is slightly easier to reliably pluralize things. I did this a while back (in mid 90s... :-) ), for an online game (a MUD), where descriptions for multiple identical items were concatenatd, and automatic pluralization was needed.

但是如果你想编码,你可以尝试相反的过程:从字典的单数词开始(下载免费的,由aspell或其他使用),使用复数规则;收集映射并切换方向。对于“类型”,您将复数形式为“类型”,反向映射将按预期工作。虽然这里也有例外,但是可靠地使事物多元化更容易一些。我做了一段时间(90年代中期...... :-)),对于在线游戏(MUD),其中多个相同项目的描述被连接,并且需要自动复数。

Also: given that it's finite number of tables you could just use simplest algorithm, get raw output, eyeball it and fix error cases manually. :-)

另外:鉴于它是有限数量的表,您可以使用最简单的算法,获取原始输出,眼球并手动修复错误情况。 :-)

#9


I think you have to use a list to translate plural into singular for some special words (in your example Types->Type).

我认为您必须使用列表将复数转换为单数形式的某些特殊单词(在您的示例中类型 - >类型)。

I think you could have a look at the sourcecode of CakePHP (you might start your search here). They are using such an algorithm for their tablenames and fieldnames to automagically join tables.

我想你可以看一下CakePHP的源代码(你可以在这里开始搜索)。他们使用这样的算法为他们的表名和字段名自动连接表。


[Edit:] Here you have some scientific work to read about "Plural inflection in English"

[编辑:]在这里你有一些科学的工作可以阅读“英语中的多次变形”

#10


I'm going to try this MorphAdorner: http://morphadorner.northwestern.edu/morphadorner/download/ (Java). It's a collection of different types of NLP processing tools, and you can test them through online examples. For your problem (that is also my problem) there's the Pluralizer tool: http://morphadorner.northwestern.edu/morphadorner/pluralizer/example/

我将尝试使用MorphAdorner:http://morphadorner.northwestern.edu/morphadorner/download/(Java)。它是不同类型的NLP处理工具的集合,您可以通过在线示例对它们进行测试。对于你的问题(这也是我的问题),有Pluralizer工具:http://morphadorner.northwestern.edu/morphadorner/pluralizer/example/

#11


Consider the python package "inflect"

考虑一下python包“inflect”

"Correctly generate plurals, singular nouns, ordinals, indefinite articles; convert numbers to words"

“正确地生成复数,单数名词,序数,不定冠词;将数字转换为单词”

https://pypi.python.org/pypi/inflect

#12


I just encounter this problem and developed a solution in 10 mins.

我刚刚遇到这个问题,并在10分钟内开发出一个解决方案。

I think @paxdiablo provides a good thought on building a transforming engine and add rules. I build one dictionary rule and three common rules. The dictionary rule goes to a dict file to lookup exception cases, while the three common rules handle "ies", "es" and "s" respectively.

我认为@paxdiablo在构建转换引擎和添加规则方面提供了很好的思考。我构建了一个字典规则和三个常用规则。字典规则转到dict文件以查找异常情况,而三个通用规则分别处理“ies”,“es”和“s”。

However, it may takes too much time to add all exceptions to the dictionary, e.g., pies/trees/bus etc. One improvement I have made to deal with these words is to make sure it can be converted back.

但是,将所有异常添加到字典中可能需要太多时间,例如,馅饼/树/公共汽车等。我处理这些单词的一个改进是确保它可以被转换回来。

E.g., if we incorrectly apply the remove "es" rule to "trees" and convert it to "tre", when trying to add plural form back, you will get "tres", which doesn't equal to the original "tree" and you know the "es" rule should not be applied. This method can solve exceptions mentioned above without adding them to a dictionary file.

例如,如果我们错误地将删除“es”规则应用于“树”并将其转换为“tre”,则在尝试添加复数形式时,您将获得“tres”,这不等于原始“树”而且你知道不应该应用“es”规则。此方法可以解决上述异常,而无需将其添加到字典文件中。

I end up with a dictionary file of 42 truly exceptional words and it could handle most of the cases.

我最终得到了42个真正特殊单词的字典文件,它可以处理大多数情况。

#13


There's a nice implementation of an inflector in uNnAddIns project that even implements an experimental spanish inflector. The idea is caught from Rails Inflector module.

在uNnAddIns项目中有一个很好的变形器实现,甚至可以实现一个实验性的西班牙变形器。这个想法来自Rails Inflector模块。

It can be used as well for other things like converting from CamelCase to normal text and other goodies and for example generating browser friendly URLs from titles.

它也可以用于其他事情,例如从CamelCase转换为普通文本和其他好东西,例如从标题生成浏览器友好的URL。