无法用Ruby组合英文单词

时间:2020-12-21 12:45:40

I need to find all English words which can be formed from the letters in a string

我需要找到所有可以用字符串中的字母组成的英语单词

 sentence="Ziegler's Giant Bar"

I can make an array of letters by

我可以制作一系列字母

 sentence.split(//)  

How can I make more than 4500 English words from the sentence in Ruby?

如何从Ruby中的句子中创建超过4500个英语单词?

[edit]

It may be best to split the problem into parts:

最好将问题分成几部分:

  1. to make only an array of words with 10 letters or less
  2. 只制作10个字母或更少的字母数组

  3. the longer words can be looked up separately
  4. 可以单独查找较长的单词

4 个解决方案

#1


[Assuming you can reuse the source letters within one word]: For each word in your dictionary list, construct two arrays of letters - one for the candidate word and one for the input string. Subtract the input array-of-letters from the word array-of-letters and if there weren't any letters left over, you've got a match. Code to do that looks like this:

[假设您可以在一个单词中重复使用源字母]:对于字典列表中的每个单词,构造两个字母数组 - 一个用于候选单词,另一个用于输入字符串。从单词array-of-letters中减去输入的字母数组,如果没有剩下任何字母,你就得到了匹配。执行此操作的代码如下所示:

def findWordsWithReplacement(sentence)
    out=[]
    splitArray=sentence.downcase.split(//)
    `cat /usr/share/dict/words`.each{|word|
        if (word.strip!.downcase.split(//) - splitArray).empty?
            out.push word
        end
     }
     return out
end

You can call that function from the irb debugger like so:

您可以从irb调试器中调用该函数,如下所示:

output=findWordsWithReplacement("some input string"); puts output.join(" ")

...or here's a wrapper you could use to call the function interactively from a script:

...或者这里是一个包装器,您可以使用它从脚本中以交互方式调用该函数:

puts "enter the text."
ARGF.each {|line|
    puts "working..."
    out=findWordsWithReplacement(line)
    puts out.join(" ")
    puts "there were #{out.size} words."
}

When running this on a Mac, the output looks like this:

在Mac上运行时,输出如下所示:

$ ./findwords.rb
enter the text.
Ziegler's Giant Bar
working...
A a aa aal aalii Aani Ab aba abaiser abalienate Abantes Abaris abas abase abaser Abasgi abasia Abassin abatable abate abater abatis abaze abb Abba abbas abbasi abbassi abbatial abbess Abbie Abe abear Abel abele Abelia Abelian Abelite abelite abeltree Aberia aberrant aberrate abet abettal Abie Abies abietate abietene abietin Abietineae Abiezer Abigail abigail abigeat abilla abintestate
[....]
Z z za Zabaean zabeta Z* zabra zabti zabtie zag zain Zan zanella zant zante Zanzalian zanze Zanzibari zar zaratite zareba zat zati zattare Zea zeal zealless zeallessness zebra zebrass Zebrina zebrine zee zein zeist zel Zelanian Zeltinger Zen Zenaga zenana zer zest zeta ziara ziarat zibeline zibet ziega zieger zig zigzag zigzagger Zilla zing zingel Zingiber zingiberene Zinnia zinsang Zinzar zira zirai Zirbanit Zirian Zirianian Zizania Zizia zizz
there were 6725 words.

$ ./findwords.rb输入文字。 Ziegler的巨型酒吧工作... A a aa aal aalii Aani Ab aba abaiser abalienate Abantes Abaris abas abase abaser Abasgi abasia Abassin abatable abate abater abatis aba abb Abba abbas abbasi abbassi abbatial abbess Abbie Abe abear Abel abele Abelia Abelian Abelite abelite abelree Aberia amatia aberal abet abettal Abie Abies abietate abiete abietin Abietineae Abiezer Abigail abigail abigeat abilla abintestate [....] Z z za Zabaean zabeta Z* zabra zabti zabtie zag zain Zan zanella zant zante Zanzalian zanz Zanzibari zar zaratite zareba zat zati zattare Zea zeal zealless zealless zebra zebrass Zebrina zebrine ZEE玉米醇溶蛋白宰斯特ZEL Zelanian Zeltinger禅Zenaga闺房寿热情泽塔ziara济亚拉特黑貂的ZIBET ziega zieger锯齿锯齿zigzagger吉拉诚zingel姜姜百日zinsang Zinzar济拉zirai Zirbanit Zirian Zirianian茭Zizia zizz有6725个字。

That is well over 4500 words, but that's because the Mac word dictionary is pretty large. If you want to reproduce Knuth's results exactly, download and unzip Knuth's dictionary from here: http://www.packetstormsecurity.org/Crackers/wordlists/dictionaries/knuth_words.gz and replace "/usr/share/dict/words" with the path to wherever you've unpacked the substitute directory. If you did it right you'll get 4514 words, ending in this collection:

这超过4500字,但那是因为Mac字典非常大。如果你想要完全复制Knuth的结果,请从这里下载并解压缩Knuth的字典:http://www.packetstormsecurity.org/Crackers/wordlists/dictionaries/knuth_words.gz并将“/ usr / share / dict / words”替换为您解压缩替代目录的路径。如果你做得对,你将获得4514个单词,以此集合结尾:

zanier zanies zaniness Zanzibar zazen zeal zebra zebras Zeiss zeitgeist Zen Zennist zest zestier zeta Ziegler zig zigging zigzag zigzagging zigzags zing zingier zings zinnia

zanier zanies zanzan zanzibar zazen zeal zebra zebras Zeiss zeitgeist Zen Zennist zest zestier zeta Ziegler zig zigging zigzag zigzagging zigzags zing zingier zings zinnia

I believe that answers the original question.

我相信这回答了原来的问题。

Alternatively, the questioner/reader might have wanted to list all the words one can construct from a string without reusing any of the input letters. My suggested code to accomplish that works as follows: Copy the candidate word, then for each letter in the input string, destructively remove the first instance of that letter from the copy (using "slice!"). If this process absorbs all the letters, accept that word.

或者,提问者/读者可能想要列出可以从字符串构造的所有单词而不重用任何输入字母。我建议的代码完成如下工作:复制候选词,然后为输入字符串中的每个字母,从副本中破坏性地删除该字母的第一个实例(使用“slice!”)。如果此过程吸收所有字母,请接受该字。

def findWordsNoReplacement(sentence)
    out=[]
    splitInput=sentence.downcase.split(//)
    `cat /usr/share/dict/words`.each{|word|
        copy=word.strip!.downcase
        splitInput.each {|o| copy.slice!(o) }
        out.push word if copy==""
     }
     return out
end

#2


If you want to find words whose letters and frequency thereof are restricted by the given phrase, you can construct a regex to do this for you:

如果要查找其字母和频率受给定短语限制的单词,可以构造一个正则表达式来为您执行此操作:

sentence = "Ziegler's Giant Bar"

# count how many times each letter occurs in the 
# sentence (ignoring case, and removing non-letters)
counts = Hash.new(0)
sentence.downcase.gsub(/[^a-z]/,'').split(//).each do |letter|
  counts[letter] += 1
end
letters = counts.keys.join
length = counts.values.inject { |a,b| a + b }

# construct a regex that matches upto that many occurences
# of only those letters, ignoring non-letters
# (in a positive look ahead)
length_regex = /(?=^(?:[^a-z]*[#{letters}]){1,#{length}}[^a-z]*$)/i
# construct regexes that matches each letter up to its
# proper frequency (in a positive look ahead)
count_regexes = counts.map do |letter, count|
  /(?=^(?:[^#{letter}]*#{letter}){0,#{count}}[^#{letter}]*$)/i
end

# combine the regexes, to form a regex that will only
# match words that are made of a subset of the letters in the string
regex = /#{length_regex}#{count_regexes.join('')}/

# open a big file of words, and find all the ones that match
words = File.open("/usr/share/dict/words") do |f|
  f.map { |word| word.chomp }.find_all { |word| regex =~ word }
end

words.length #=> 3182
words #=> ["A", "a", "aa", "aal", "aalii", "Aani", "Ab", "aba", "abaiser", "Abantes",
          "Abaris", "abas", "abase", "abaser", "Abasgi", "abate", "abater", "abatis",
          ...
          "ba", "baa", "Baal", "baal", "Baalist", "Baalite", "Baalize", "baar", "bae",
          "Baeria", "baetzner", "bag", "baga", "bagani", "bagatine", "bagel", "bagganet",
          ...
          "eager", "eagle", "eaglet", "eagre", "ean", "ear", "earing", "earl", "earlet",
          "earn", "earner", "earnest", "earring", "eartab", "ease", "easel", "easer",
          ...
          "gab", "Gabe", "gabi", "gable", "gablet", "Gabriel", "Gael", "gaen", "gaet",
          "gag", "gagate", "gage", "gageable", "gagee", "gageite", "gager", "Gaia",
          ...
          "Iberian", "Iberis", "iberite", "ibis", "Ibsenite", "ie", "Ierne", "Igara",
          "Igbira", "ignatia", "ignite", "igniter", "Ila", "ilesite", "ilia", "Ilian",
          ...
          "laang", "lab", "Laban", "labia", "labiate", "labis", "labra", "labret", "laet",
          "laeti", "lag", "lagan", "lagen", "lagena", "lager", "laggar", "laggen",
          ...
          "Nabal", "Nabalite", "nabla", "nable", "nabs", "nae", "naegate", "naegates",
          "nael", "nag", "Naga", "naga", "Nagari", "nagger", "naggle", "nagster", "Naias",
          ...
          "Rab", "rab", "rabat", "rabatine", "Rabi", "rabies", "rabinet", "rag", "raga",
          "rage", "rager", "raggee", "ragger", "raggil", "raggle", "raging", "raglan",
          ...
          "sa", "saa", "Saan", "sab", "Saba", "Sabal", "Saban", "sabe", "saber",
          "saberleg", "Sabia", "S*", "Sabina", "sabina", "Sabine", "sabine", "Sabir",
          ...
          "tabes", "Tabira", "tabla", "table", "tabler", "tables", "tabling", "Tabriz",
          "tae", "tael", "taen", "taenia", "taenial", "tag", "Tagabilis", "Tagal",
          ...
          "zest", "zeta", "ziara", "ziarat", "zibeline", "zibet", "ziega", "zieger",
          "zig", "zing", "zingel", "Zingiber", "zira", "zirai", "Zirbanit", "Zirian"]

Positive lookaheads let you make a regex that matches a position in the string where some specified pattern matches without consuming the part of the string that matches. We use them here to match the same string against multiple patterns in a single regex. The position only matches if all our patterns match.

正向前瞻使您可以创建一个正则表达式,该正则表达式匹配字符串中某些指定模式匹配的位置,而不会消耗匹配的字符串部分。我们在这里使用它们来匹配单个正则表达式中的多个模式的相同字符串。如果所有模式都匹配,则该位置仅匹配。

If we allow infinite reuse of letters from the original phrase (like Knuth did according to glenra's comment), then it's even easier to construct a regex:

如果我们允许无限次重用原始短语中的字母(就像Knuth根据glenra的评论所做的那样),那么构建正则表达式就更容易了:

sentence = "Ziegler's Giant Bar"

# find all the letters in the sentence
letters = sentence.downcase.gsub(/[^a-z]/,'').split(//).uniq

# construct a regex that matches any line in which
# the only letters used are the ones in the sentence
regex = /^([^a-z]|[#{letters.join}])*$/i

# open a big file of words, and find all the ones that match
words = File.open("/usr/share/dict/words") do |f|
  f.map { |word| word.chomp }.find_all { |word| regex =~ word }
end

words.length #=> 6725
words #=> ["A", "a", "aa", "aal", "aalii", "Aani", "Ab", "aba", "abaiser", "abalienate",
           ...
           "azine", "B", "b", "ba", "baa", "Baal", "baal", "Baalist", "Baalite",
           "Baalize", "baar", "Bab", "baba", "babai", "Babbie", "Babbitt", "babbitt",
           ...
           "Britannian", "britten", "brittle", "brittleness", "brittling", "Briza",
           "brizz", "E", "e", "ea", "eager", "eagerness", "eagle", "eagless", "eaglet",
           "eagre", "ean", "ear", "earing", "earl", "earless", "earlet", "earliness",
           ...
           "eternalize", "eternalness", "eternize", "etesian", "etna", "Etnean", "Etta",
           "Ettarre", "ettle", "ezba", "Ezra", "G", "g", "Ga", "ga", "gab", "gabber",
           "gabble", "gabbler", "Gabe", "gabelle", "gabeller", "gabgab", "gabi", "gable",
           ...
           "grittiness", "grittle", "Grizel", "Grizzel", "grizzle", "grizzler", "grr",
           "I", "i", "iba", "Iban", "Ibanag", "Iberes", "Iberi", "Iberia", "Iberian",
           ...
           "itinerarian", "itinerate", "its", "Itza", "Izar", "izar", "izle", "iztle",
           "L", "l", "la", "laager", "laang", "lab", "Laban", "labara", "labba", "labber",
           ...
           "litter", "litterer", "little", "littleness", "littling", "littress", "litz",
           "Liz", "Lizzie", "Llanberisslate", "N", "n", "na", "naa", "Naassenes", "nab",
           "Nabal", "Nabalite", "Nabataean", "Nabatean", "nabber", "nabla", "nable",
           ...
           "niter", "nitraniline", "nitrate", "nitratine", "Nitrian", "nitrile",
           "nitrite", "nitter", "R", "r", "ra", "Rab", "rab", "rabanna", "rabat",
           "rabatine", "rabatte", "rabbanist", "rabbanite", "rabbet", "rabbeting",
           ...
           "riteless", "ritelessness", "ritling", "rittingerite", "rizzar", "rizzle", "S",
           "s", "sa", "saa", "Saan", "sab", "Saba", "Sabaean", "sabaigrass", "Sabaist",
           ...
           "strigine", "string", "stringene", "stringent", "stringentness", "stringer",
           "stringiness", "stringing", "stringless", "strit", "T", "t", "ta", "taa",
           "Taal", "taar", "Tab", "tab", "tabaret", "tabbarea", "tabber", "tabbinet",
           ...
           "tsessebe", "tsetse", "tsia", "tsine", "tst", "tzaritza", "Tzental", "Z", "z",
           "za", "Zabaean", "zabeta", "Z*", "zabra", "zabti", "zabtie", "zag", "zain",
           ...
           "Zirian", "Zirianian", "Zizania", "Zizia", "zizz"]

#3


I don't think that Ruby has an English dictionary. But you could try to store all permutations of the original string in an array, and check those strings against Google? Say that a word is actually a word, if has more than 100.000 hits or something?

我不认为Ruby有英文字典。但是您可以尝试将原始字符串的所有排列存储在一个数组中,并检查这些字符串是否与Google相关?假设一个单词实际上是一个单词,如果有超过100.000次点击或其他什么?

#4


You can get an array of letters like so:

您可以获得一系列字母:

sentence = "Ziegler's Giant Bar"
letters = sentence.split(//)

#1


[Assuming you can reuse the source letters within one word]: For each word in your dictionary list, construct two arrays of letters - one for the candidate word and one for the input string. Subtract the input array-of-letters from the word array-of-letters and if there weren't any letters left over, you've got a match. Code to do that looks like this:

[假设您可以在一个单词中重复使用源字母]:对于字典列表中的每个单词,构造两个字母数组 - 一个用于候选单词,另一个用于输入字符串。从单词array-of-letters中减去输入的字母数组,如果没有剩下任何字母,你就得到了匹配。执行此操作的代码如下所示:

def findWordsWithReplacement(sentence)
    out=[]
    splitArray=sentence.downcase.split(//)
    `cat /usr/share/dict/words`.each{|word|
        if (word.strip!.downcase.split(//) - splitArray).empty?
            out.push word
        end
     }
     return out
end

You can call that function from the irb debugger like so:

您可以从irb调试器中调用该函数,如下所示:

output=findWordsWithReplacement("some input string"); puts output.join(" ")

...or here's a wrapper you could use to call the function interactively from a script:

...或者这里是一个包装器,您可以使用它从脚本中以交互方式调用该函数:

puts "enter the text."
ARGF.each {|line|
    puts "working..."
    out=findWordsWithReplacement(line)
    puts out.join(" ")
    puts "there were #{out.size} words."
}

When running this on a Mac, the output looks like this:

在Mac上运行时,输出如下所示:

$ ./findwords.rb
enter the text.
Ziegler's Giant Bar
working...
A a aa aal aalii Aani Ab aba abaiser abalienate Abantes Abaris abas abase abaser Abasgi abasia Abassin abatable abate abater abatis abaze abb Abba abbas abbasi abbassi abbatial abbess Abbie Abe abear Abel abele Abelia Abelian Abelite abelite abeltree Aberia aberrant aberrate abet abettal Abie Abies abietate abietene abietin Abietineae Abiezer Abigail abigail abigeat abilla abintestate
[....]
Z z za Zabaean zabeta Z* zabra zabti zabtie zag zain Zan zanella zant zante Zanzalian zanze Zanzibari zar zaratite zareba zat zati zattare Zea zeal zealless zeallessness zebra zebrass Zebrina zebrine zee zein zeist zel Zelanian Zeltinger Zen Zenaga zenana zer zest zeta ziara ziarat zibeline zibet ziega zieger zig zigzag zigzagger Zilla zing zingel Zingiber zingiberene Zinnia zinsang Zinzar zira zirai Zirbanit Zirian Zirianian Zizania Zizia zizz
there were 6725 words.

$ ./findwords.rb输入文字。 Ziegler的巨型酒吧工作... A a aa aal aalii Aani Ab aba abaiser abalienate Abantes Abaris abas abase abaser Abasgi abasia Abassin abatable abate abater abatis aba abb Abba abbas abbasi abbassi abbatial abbess Abbie Abe abear Abel abele Abelia Abelian Abelite abelite abelree Aberia amatia aberal abet abettal Abie Abies abietate abiete abietin Abietineae Abiezer Abigail abigail abigeat abilla abintestate [....] Z z za Zabaean zabeta Z* zabra zabti zabtie zag zain Zan zanella zant zante Zanzalian zanz Zanzibari zar zaratite zareba zat zati zattare Zea zeal zealless zealless zebra zebrass Zebrina zebrine ZEE玉米醇溶蛋白宰斯特ZEL Zelanian Zeltinger禅Zenaga闺房寿热情泽塔ziara济亚拉特黑貂的ZIBET ziega zieger锯齿锯齿zigzagger吉拉诚zingel姜姜百日zinsang Zinzar济拉zirai Zirbanit Zirian Zirianian茭Zizia zizz有6725个字。

That is well over 4500 words, but that's because the Mac word dictionary is pretty large. If you want to reproduce Knuth's results exactly, download and unzip Knuth's dictionary from here: http://www.packetstormsecurity.org/Crackers/wordlists/dictionaries/knuth_words.gz and replace "/usr/share/dict/words" with the path to wherever you've unpacked the substitute directory. If you did it right you'll get 4514 words, ending in this collection:

这超过4500字,但那是因为Mac字典非常大。如果你想要完全复制Knuth的结果,请从这里下载并解压缩Knuth的字典:http://www.packetstormsecurity.org/Crackers/wordlists/dictionaries/knuth_words.gz并将“/ usr / share / dict / words”替换为您解压缩替代目录的路径。如果你做得对,你将获得4514个单词,以此集合结尾:

zanier zanies zaniness Zanzibar zazen zeal zebra zebras Zeiss zeitgeist Zen Zennist zest zestier zeta Ziegler zig zigging zigzag zigzagging zigzags zing zingier zings zinnia

zanier zanies zanzan zanzibar zazen zeal zebra zebras Zeiss zeitgeist Zen Zennist zest zestier zeta Ziegler zig zigging zigzag zigzagging zigzags zing zingier zings zinnia

I believe that answers the original question.

我相信这回答了原来的问题。

Alternatively, the questioner/reader might have wanted to list all the words one can construct from a string without reusing any of the input letters. My suggested code to accomplish that works as follows: Copy the candidate word, then for each letter in the input string, destructively remove the first instance of that letter from the copy (using "slice!"). If this process absorbs all the letters, accept that word.

或者,提问者/读者可能想要列出可以从字符串构造的所有单词而不重用任何输入字母。我建议的代码完成如下工作:复制候选词,然后为输入字符串中的每个字母,从副本中破坏性地删除该字母的第一个实例(使用“slice!”)。如果此过程吸收所有字母,请接受该字。

def findWordsNoReplacement(sentence)
    out=[]
    splitInput=sentence.downcase.split(//)
    `cat /usr/share/dict/words`.each{|word|
        copy=word.strip!.downcase
        splitInput.each {|o| copy.slice!(o) }
        out.push word if copy==""
     }
     return out
end

#2


If you want to find words whose letters and frequency thereof are restricted by the given phrase, you can construct a regex to do this for you:

如果要查找其字母和频率受给定短语限制的单词,可以构造一个正则表达式来为您执行此操作:

sentence = "Ziegler's Giant Bar"

# count how many times each letter occurs in the 
# sentence (ignoring case, and removing non-letters)
counts = Hash.new(0)
sentence.downcase.gsub(/[^a-z]/,'').split(//).each do |letter|
  counts[letter] += 1
end
letters = counts.keys.join
length = counts.values.inject { |a,b| a + b }

# construct a regex that matches upto that many occurences
# of only those letters, ignoring non-letters
# (in a positive look ahead)
length_regex = /(?=^(?:[^a-z]*[#{letters}]){1,#{length}}[^a-z]*$)/i
# construct regexes that matches each letter up to its
# proper frequency (in a positive look ahead)
count_regexes = counts.map do |letter, count|
  /(?=^(?:[^#{letter}]*#{letter}){0,#{count}}[^#{letter}]*$)/i
end

# combine the regexes, to form a regex that will only
# match words that are made of a subset of the letters in the string
regex = /#{length_regex}#{count_regexes.join('')}/

# open a big file of words, and find all the ones that match
words = File.open("/usr/share/dict/words") do |f|
  f.map { |word| word.chomp }.find_all { |word| regex =~ word }
end

words.length #=> 3182
words #=> ["A", "a", "aa", "aal", "aalii", "Aani", "Ab", "aba", "abaiser", "Abantes",
          "Abaris", "abas", "abase", "abaser", "Abasgi", "abate", "abater", "abatis",
          ...
          "ba", "baa", "Baal", "baal", "Baalist", "Baalite", "Baalize", "baar", "bae",
          "Baeria", "baetzner", "bag", "baga", "bagani", "bagatine", "bagel", "bagganet",
          ...
          "eager", "eagle", "eaglet", "eagre", "ean", "ear", "earing", "earl", "earlet",
          "earn", "earner", "earnest", "earring", "eartab", "ease", "easel", "easer",
          ...
          "gab", "Gabe", "gabi", "gable", "gablet", "Gabriel", "Gael", "gaen", "gaet",
          "gag", "gagate", "gage", "gageable", "gagee", "gageite", "gager", "Gaia",
          ...
          "Iberian", "Iberis", "iberite", "ibis", "Ibsenite", "ie", "Ierne", "Igara",
          "Igbira", "ignatia", "ignite", "igniter", "Ila", "ilesite", "ilia", "Ilian",
          ...
          "laang", "lab", "Laban", "labia", "labiate", "labis", "labra", "labret", "laet",
          "laeti", "lag", "lagan", "lagen", "lagena", "lager", "laggar", "laggen",
          ...
          "Nabal", "Nabalite", "nabla", "nable", "nabs", "nae", "naegate", "naegates",
          "nael", "nag", "Naga", "naga", "Nagari", "nagger", "naggle", "nagster", "Naias",
          ...
          "Rab", "rab", "rabat", "rabatine", "Rabi", "rabies", "rabinet", "rag", "raga",
          "rage", "rager", "raggee", "ragger", "raggil", "raggle", "raging", "raglan",
          ...
          "sa", "saa", "Saan", "sab", "Saba", "Sabal", "Saban", "sabe", "saber",
          "saberleg", "Sabia", "S*", "Sabina", "sabina", "Sabine", "sabine", "Sabir",
          ...
          "tabes", "Tabira", "tabla", "table", "tabler", "tables", "tabling", "Tabriz",
          "tae", "tael", "taen", "taenia", "taenial", "tag", "Tagabilis", "Tagal",
          ...
          "zest", "zeta", "ziara", "ziarat", "zibeline", "zibet", "ziega", "zieger",
          "zig", "zing", "zingel", "Zingiber", "zira", "zirai", "Zirbanit", "Zirian"]

Positive lookaheads let you make a regex that matches a position in the string where some specified pattern matches without consuming the part of the string that matches. We use them here to match the same string against multiple patterns in a single regex. The position only matches if all our patterns match.

正向前瞻使您可以创建一个正则表达式,该正则表达式匹配字符串中某些指定模式匹配的位置,而不会消耗匹配的字符串部分。我们在这里使用它们来匹配单个正则表达式中的多个模式的相同字符串。如果所有模式都匹配,则该位置仅匹配。

If we allow infinite reuse of letters from the original phrase (like Knuth did according to glenra's comment), then it's even easier to construct a regex:

如果我们允许无限次重用原始短语中的字母(就像Knuth根据glenra的评论所做的那样),那么构建正则表达式就更容易了:

sentence = "Ziegler's Giant Bar"

# find all the letters in the sentence
letters = sentence.downcase.gsub(/[^a-z]/,'').split(//).uniq

# construct a regex that matches any line in which
# the only letters used are the ones in the sentence
regex = /^([^a-z]|[#{letters.join}])*$/i

# open a big file of words, and find all the ones that match
words = File.open("/usr/share/dict/words") do |f|
  f.map { |word| word.chomp }.find_all { |word| regex =~ word }
end

words.length #=> 6725
words #=> ["A", "a", "aa", "aal", "aalii", "Aani", "Ab", "aba", "abaiser", "abalienate",
           ...
           "azine", "B", "b", "ba", "baa", "Baal", "baal", "Baalist", "Baalite",
           "Baalize", "baar", "Bab", "baba", "babai", "Babbie", "Babbitt", "babbitt",
           ...
           "Britannian", "britten", "brittle", "brittleness", "brittling", "Briza",
           "brizz", "E", "e", "ea", "eager", "eagerness", "eagle", "eagless", "eaglet",
           "eagre", "ean", "ear", "earing", "earl", "earless", "earlet", "earliness",
           ...
           "eternalize", "eternalness", "eternize", "etesian", "etna", "Etnean", "Etta",
           "Ettarre", "ettle", "ezba", "Ezra", "G", "g", "Ga", "ga", "gab", "gabber",
           "gabble", "gabbler", "Gabe", "gabelle", "gabeller", "gabgab", "gabi", "gable",
           ...
           "grittiness", "grittle", "Grizel", "Grizzel", "grizzle", "grizzler", "grr",
           "I", "i", "iba", "Iban", "Ibanag", "Iberes", "Iberi", "Iberia", "Iberian",
           ...
           "itinerarian", "itinerate", "its", "Itza", "Izar", "izar", "izle", "iztle",
           "L", "l", "la", "laager", "laang", "lab", "Laban", "labara", "labba", "labber",
           ...
           "litter", "litterer", "little", "littleness", "littling", "littress", "litz",
           "Liz", "Lizzie", "Llanberisslate", "N", "n", "na", "naa", "Naassenes", "nab",
           "Nabal", "Nabalite", "Nabataean", "Nabatean", "nabber", "nabla", "nable",
           ...
           "niter", "nitraniline", "nitrate", "nitratine", "Nitrian", "nitrile",
           "nitrite", "nitter", "R", "r", "ra", "Rab", "rab", "rabanna", "rabat",
           "rabatine", "rabatte", "rabbanist", "rabbanite", "rabbet", "rabbeting",
           ...
           "riteless", "ritelessness", "ritling", "rittingerite", "rizzar", "rizzle", "S",
           "s", "sa", "saa", "Saan", "sab", "Saba", "Sabaean", "sabaigrass", "Sabaist",
           ...
           "strigine", "string", "stringene", "stringent", "stringentness", "stringer",
           "stringiness", "stringing", "stringless", "strit", "T", "t", "ta", "taa",
           "Taal", "taar", "Tab", "tab", "tabaret", "tabbarea", "tabber", "tabbinet",
           ...
           "tsessebe", "tsetse", "tsia", "tsine", "tst", "tzaritza", "Tzental", "Z", "z",
           "za", "Zabaean", "zabeta", "Z*", "zabra", "zabti", "zabtie", "zag", "zain",
           ...
           "Zirian", "Zirianian", "Zizania", "Zizia", "zizz"]

#3


I don't think that Ruby has an English dictionary. But you could try to store all permutations of the original string in an array, and check those strings against Google? Say that a word is actually a word, if has more than 100.000 hits or something?

我不认为Ruby有英文字典。但是您可以尝试将原始字符串的所有排列存储在一个数组中,并检查这些字符串是否与Google相关?假设一个单词实际上是一个单词,如果有超过100.000次点击或其他什么?

#4


You can get an array of letters like so:

您可以获得一系列字母:

sentence = "Ziegler's Giant Bar"
letters = sentence.split(//)