迭代数组中的每个单词

I have a very big .txt file and i want to write a ruby script to filter through some data. Basically I want to iterate over each line and then store the individual words in the line in an array and then operate on the words. however I am not able to get each word separately in a array

我有一个很大的。txt文件，我想写一个ruby脚本来过滤一些数据。基本上，我希望遍历每一行，然后将单个单词存储在数组中的行中，然后对这些单词进行操作。然而，我不能将每个单词单独地放在一个数组中。

tracker_file.each_line do|line|
arr = "#{line}"

I can get the entire line like this but how about the individual words?

我可以得到这样的整行但是单个的词呢?

Thanks

谢谢

5 个解决方案

#1

Use the split method on a string.

在字符串上使用split方法。

irb(main):001:0> line = "one two three"
=> "one two three"
irb(main):002:0> line.split
=> ["one", "two", "three"]

So your example would be:

你的例子是:

tracker_file.each_line do |line|
  arr = line.split
  # ... do stuff with arr
end

#2

tracker_file.each_line do |line|
  line.scan(/[\w']+/) do |word|
    ...
  end
end

If you do not need to iterate over lines, you can directly iterate over words:

如果不需要遍历行，可以直接遍历单词:

tracker_file.read.scan(/[\w']+/) do |word|
    ...
end

#3

You can do:

你能做什么:

tracker_file.each_line do |line|
    arr = line.split
# Then perform operations on the array
end

The split method will split a string into an array based on a delimiter, in this case, a space.

split方法将基于分隔符(在本例中是空格)将字符串分割为数组。

#4

If you're reading something written in English and the text may contain hyphens, semicolons, spaces, periods, etc. you might consider a regular expression, such as the following:

如果你正在阅读用英语写的东西，而文本可能包含连字符、分号、空格、句点等，你可以考虑使用正则表达式，例如:

/[a-zA-Z]+(\-[a-zA-Z]+)*/

to extract the words instead.

而是提取单词。

#5

You don't have to use IO#each_line, you could also use IO#each(separator_string)

不需要使用IO#each_line，也可以使用IO#each(separation ator_string)

Another option is to use IO#gets:

另一个选择是使用IO#get:

while word = tracker_file.gets(/separator_regexp/)
  # use the word
end

#1