I have a very big .txt file and i want to write a ruby script to filter through some data. Basically I want to iterate over each line and then store the individual words in the line in an array and then operate on the words. however I am not able to get each word separately in a array
我有一个很大的。txt文件,我想写一个ruby脚本来过滤一些数据。基本上,我希望遍历每一行,然后将单个单词存储在数组中的行中,然后对这些单词进行操作。然而,我不能将每个单词单独地放在一个数组中。
tracker_file.each_line do|line|
arr = "#{line}"
I can get the entire line like this but how about the individual words?
我可以得到这样的整行但是单个的词呢?
Thanks
谢谢
5 个解决方案
#1
3
Use the split
method on a string.
在字符串上使用split方法。
irb(main):001:0> line = "one two three"
=> "one two three"
irb(main):002:0> line.split
=> ["one", "two", "three"]
So your example would be:
你的例子是:
tracker_file.each_line do |line|
arr = line.split
# ... do stuff with arr
end
#2
3
tracker_file.each_line do |line|
line.scan(/[\w']+/) do |word|
...
end
end
If you do not need to iterate over lines, you can directly iterate over words:
如果不需要遍历行,可以直接遍历单词:
tracker_file.read.scan(/[\w']+/) do |word|
...
end
#3
0
You can do:
你能做什么:
tracker_file.each_line do |line|
arr = line.split
# Then perform operations on the array
end
The split
method will split a string into an array based on a delimiter, in this case, a space.
split方法将基于分隔符(在本例中是空格)将字符串分割为数组。
#4
0
If you're reading something written in English and the text may contain hyphens, semicolons, spaces, periods, etc. you might consider a regular expression, such as the following:
如果你正在阅读用英语写的东西,而文本可能包含连字符、分号、空格、句点等,你可以考虑使用正则表达式,例如:
/[a-zA-Z]+(\-[a-zA-Z]+)*/
to extract the words instead.
而是提取单词。
#5
0
You don't have to use IO#each_line
, you could also use IO#each(separator_string)
不需要使用IO#each_line,也可以使用IO#each(separation ator_string)
Another option is to use IO#gets
:
另一个选择是使用IO#get:
while word = tracker_file.gets(/separator_regexp/)
# use the word
end
#1
3
Use the split
method on a string.
在字符串上使用split方法。
irb(main):001:0> line = "one two three"
=> "one two three"
irb(main):002:0> line.split
=> ["one", "two", "three"]
So your example would be:
你的例子是:
tracker_file.each_line do |line|
arr = line.split
# ... do stuff with arr
end
#2
3
tracker_file.each_line do |line|
line.scan(/[\w']+/) do |word|
...
end
end
If you do not need to iterate over lines, you can directly iterate over words:
如果不需要遍历行,可以直接遍历单词:
tracker_file.read.scan(/[\w']+/) do |word|
...
end
#3
0
You can do:
你能做什么:
tracker_file.each_line do |line|
arr = line.split
# Then perform operations on the array
end
The split
method will split a string into an array based on a delimiter, in this case, a space.
split方法将基于分隔符(在本例中是空格)将字符串分割为数组。
#4
0
If you're reading something written in English and the text may contain hyphens, semicolons, spaces, periods, etc. you might consider a regular expression, such as the following:
如果你正在阅读用英语写的东西,而文本可能包含连字符、分号、空格、句点等,你可以考虑使用正则表达式,例如:
/[a-zA-Z]+(\-[a-zA-Z]+)*/
to extract the words instead.
而是提取单词。
#5
0
You don't have to use IO#each_line
, you could also use IO#each(separator_string)
不需要使用IO#each_line,也可以使用IO#each(separation ator_string)
Another option is to use IO#gets
:
另一个选择是使用IO#get:
while word = tracker_file.gets(/separator_regexp/)
# use the word
end