将一个大型数组解析为多个子数组

时间:2022-11-07 12:17:26

I have a list of adjectives (found here), that I would like to be the basis for a "random_adjective(category)" method.

我有一个形容词列表(在这里找到),我希望成为“random_adjective(category)”方法的基础。

I'm really just taking a stab at this, as my first real attempt at a useful program.
Step 1: Open file, remove formatting. No problem.

我真的只是对此进行了尝试,这是我第一次真正尝试一个有用的程序。第1步:打开文件,删除格式。没问题。

list=File.read('adjectivelist')
list.gsub(/\n/, " ")

The next step is to break the string up by category..

下一步是按类别打破字符串..

list.split(" ")

Now I have an array of every word in the file. Neat. The ones with a tilde before them represent the category names.

现在我有一个文件中每个单词的数组。整齐。在它们之前具有波浪号的那些代表类别名称。

Now I would like to break up this LARGE array into several smaller ones, based on category. I need help with the syntax here, although the pseudocode for this would be something like

现在我想根据类别将这个LARGE数组拆分成几个较小的数组。我需要这里的语法帮助,虽然这个伪代码就像

Scan the array for an element which begins with a tilde. Now create a new array based on the name of that element sans the tilde, and ALSO place this "category name" into the "categories" array. Now pull all the elements from the main array, and pop them into the sub-array, until you meet another tilde. Then repeat the process until there are no more elements in the array.

扫描数组以查找以波浪号开头的元素。现在根据没有波浪号的元素的名称创建一个新数组,并将此“类别名称”放入“categories”数组中。现在从主数组中拉出所有元素,然后将它们弹出到子数组中,直到遇到另一个波形符。然后重复该过程,直到阵列中没有更多元素。

Finally I would pull a random word from the category named in the parameter. If there was no category name matching the parameter, it would return false and exit (this is simply in case I want to add more categories later.)

最后,我会从参数中指定的类别中提取一个随机单词。如果没有与参数匹配的类别名称,它将返回false并退出(这只是以防我以后想要添加更多类别。)

Tips would be appreciated

提示将不胜感激

3 个解决方案

#1


2  

Use slice_before:

使用slice_before:

categories = list.split(" ").slice_before(/~\w+/)

This will create an sub array for each word starting with ~, containing all words before the next matching word.

这将为每个以〜开头的单词创建一个子数组,包含下一个匹配单词之前的所有单词。

#2


3  

You may want to go back and split first time around like this:

你可能想要第一次回去分开这样:

categories = list.split(" ~")

Then each list item will start with the category name. This will save you having to go back through your data structure as you suggest. Consider that a tip: sometimes it's better to re-think the start of a coding problem than to head inexorably forwards

然后每个列表项将以类别名称开头。这样可以避免您按照自己的建议重新浏览数据结构。考虑一个提示:有时候,重新考虑编码问题的开始,而不是无情地前进

The structure you are reaching towards is probably a Hash, where the keys are category names, and the values are arrays of all the matching adjectives. It might look like this:

你要达到的结构可能是一个哈希,其中键是类别名称,值是所有匹配形容词的数组。它可能看起来像这样:

{
  'category' => [ 'word1', 'word2', 'word3' ]
}

So you might do this:

所以你可以这样做:

words_in_category = Hash.new

categories.each do |category_string|
  cat_name, *words = category_string.split(" ")
  words_in_category[cat_name] = words
end

Finally, to pick a random element from an array, Ruby provides a very useful method sample, so you can just do this

最后,为了从数组中选择一个随机元素,Ruby提供了一个非常有用的方法示例,因此您可以这样做

words_in_category[ chosen_category ].sample

. . . assuming chosen_category contains the string name of an actual category. I'll leave it to you to figure out how to put this all together and handle errors, bad input etc

。 。 。假设chosen_category包含实际类别的字符串名称。我会留给你弄清楚如何将这些放在一起并处理错误,输入错误等

#3


1  

If this file format is your original and you have freedom to change it, then I recommend you save the data as yaml or json format and read it when needed. There are libraries to do this. That is all. No worry about the mess. Don't spend time reinventing the wheel.

如果此文件格式是您的原始格式并且您可以*更改它,那么我建议您将数据保存为yaml或json格式并在需要时读取。有图书馆可以做到这一点。就这些。不用担心这个烂摊子。不要花时间重新发明*。

#1


2  

Use slice_before:

使用slice_before:

categories = list.split(" ").slice_before(/~\w+/)

This will create an sub array for each word starting with ~, containing all words before the next matching word.

这将为每个以〜开头的单词创建一个子数组,包含下一个匹配单词之前的所有单词。

#2


3  

You may want to go back and split first time around like this:

你可能想要第一次回去分开这样:

categories = list.split(" ~")

Then each list item will start with the category name. This will save you having to go back through your data structure as you suggest. Consider that a tip: sometimes it's better to re-think the start of a coding problem than to head inexorably forwards

然后每个列表项将以类别名称开头。这样可以避免您按照自己的建议重新浏览数据结构。考虑一个提示:有时候,重新考虑编码问题的开始,而不是无情地前进

The structure you are reaching towards is probably a Hash, where the keys are category names, and the values are arrays of all the matching adjectives. It might look like this:

你要达到的结构可能是一个哈希,其中键是类别名称,值是所有匹配形容词的数组。它可能看起来像这样:

{
  'category' => [ 'word1', 'word2', 'word3' ]
}

So you might do this:

所以你可以这样做:

words_in_category = Hash.new

categories.each do |category_string|
  cat_name, *words = category_string.split(" ")
  words_in_category[cat_name] = words
end

Finally, to pick a random element from an array, Ruby provides a very useful method sample, so you can just do this

最后,为了从数组中选择一个随机元素,Ruby提供了一个非常有用的方法示例,因此您可以这样做

words_in_category[ chosen_category ].sample

. . . assuming chosen_category contains the string name of an actual category. I'll leave it to you to figure out how to put this all together and handle errors, bad input etc

。 。 。假设chosen_category包含实际类别的字符串名称。我会留给你弄清楚如何将这些放在一起并处理错误,输入错误等

#3


1  

If this file format is your original and you have freedom to change it, then I recommend you save the data as yaml or json format and read it when needed. There are libraries to do this. That is all. No worry about the mess. Don't spend time reinventing the wheel.

如果此文件格式是您的原始格式并且您可以*更改它,那么我建议您将数据保存为yaml或json格式并在需要时读取。有图书馆可以做到这一点。就这些。不用担心这个烂摊子。不要花时间重新发明*。