如何将文本分割成键-值对?

时间:2022-03-28 20:27:35

I'm building a script to read and parse markdown files in Ruby. The script needs to be able to read and understand the multimarkdown header information at the top of the files so that it can perform additional actions on the output.

我正在构建一个脚本来读取和解析Ruby中的markdown文件。脚本需要能够读取和理解文件顶部的multimarkdown头信息,以便在输出上执行其他操作。

The header values look like this:

头值如下所示:

Title: My Treatise on Kumquats
Author: Joe Schmoe
Author URL: http://somedudeswebsite.me/
Host URL: http://googlesnewthing.com/
Created: 2012-01-01 09:41

I can't figure out how to split the lines of text into a simple key-value dictionary. The built in split function doesn't seem to work in this case because I only want it to split on the first occurrence of a colon (:) in each line. Additional colons would be part of the value string.

我不知道如何将文本行分割成一个简单的键值字典。在这种情况下,内置的split函数似乎不起作用,因为我只想让它在每行中的第一个冒号(:)上进行分割。附加的冒号将是值字符串的一部分。

In case it matters I'm using Ruby 1.8.7 on OS X.

如果重要的话,我在OS X上使用Ruby 1.8.7。

5 个解决方案

#1


5  

Use split with an optional second parameter (thanks to @MichaelKohl)

使用可选的第二个参数split(感谢@MichaelKohl)

s = 'Author URL: http://somedudeswebsite.me/'
key, value = s.split ': ', 2
puts key
puts value

Output

输出

Author URL
http://somedudeswebsite.me/

#2


7  

This does it:

这它:

s = <<EOS
Title: My Treatise on Kumquats
Author: Joe Schmoe
Author URL: http://somedudeswebsite.me/
Host URL: http://googlesnewthing.com/
Created: 2012-01-01 09:41
EOS

h = Hash[s.each_line.map { |l| l.chomp.split(': ', 2) }]
p h

Output:

输出:

{"Title"=>"My Treatise on Kumquats", "Author"=>"Joe Schmoe", "Author URL"=>"http://somedudeswebsite.me/", "Host URL"=>"http://googlesnewthing.com/", "Created"=>"2012-01-01 09:41"}

#3


1  

You can use regex to parse your text:

您可以使用regex解析您的文本:

str = "Title: My Treatise on Kumquats
Author: Joe Schmoe
Author URL: http://somedudeswebsite.me/
Host URL: http://googlesnewthing.com/
Created: 2012-01-01 09:41"

matches = str.scan /^(.+?): (.+?)$/m

matches.each { |m|
   key = m[0]
   value = m[1]
}

This is multi-line regex (/<regex>/m) - it will match each line into two groups (with indexes 0 and 1). First group will contain all characters before the first occurence of ": " (colon + space). Second group will contain all the rest characters in this line (until regex encounter end of line $).

这是多行regex (/ /m)——它将每行匹配为两组(索引为0和1)。第二组将包含这一行中的所有其余字符(直到regex遇到第$行末尾)。

This is how you can convert result into Hash:

这就是如何将结果转换为散列的:

dictionary = matches.inject({}) do |dict, m| 
  dict[m[0]] = m[1]
  dict
end

UPDATE

更新

As Michael Kohl mentioned, it is possible to write this in one line:

正如迈克尔·科尔(Michael Kohl)所提到的,可以用一句话来描述:

hash = Hash[str.scan /^(.+?): (.+?)$/m]

#4


0  

You can simply do this by

你可以通过

>> s = 'Author URL: http://somedudeswebsite.me/'
>> first_idx = s.index(':')
>> key,value = s[0..first_idx-1],s[first_idx+1..s.length]
=> ["Author URL", " http://somedudeswebsite.me/"]

or to key value hash by

或键值哈希by

>> kv = Hash[*s[0..first_idx-1],s[first_idx+1..s.length]]
=> {"Author URL"=>" http://somedudeswebsite.me/"}

Hope this helps

希望这有助于

#5


0  

Is line.split(':',2) what you want?

你想要什么?

String#split accepts a second argument which specifies parts to be splited. it works in ruby 1.9.3, not sure in earlier versions. (but i'm almost sure it also works in 1.9.2 too)

String#split接受第二个参数,该参数指定要分割的部分。它在ruby 1.9.3中工作,在早期版本中不确定。(但我几乎可以肯定的是,在1.9.2版本中它也同样适用)

If this is not available, line.scan(%r{^([^:]*):(.*)}) should also work.

如果这不是可用,line.scan r(% { ^(^:*):(. *)})也应该工作。

#1


5  

Use split with an optional second parameter (thanks to @MichaelKohl)

使用可选的第二个参数split(感谢@MichaelKohl)

s = 'Author URL: http://somedudeswebsite.me/'
key, value = s.split ': ', 2
puts key
puts value

Output

输出

Author URL
http://somedudeswebsite.me/

#2


7  

This does it:

这它:

s = <<EOS
Title: My Treatise on Kumquats
Author: Joe Schmoe
Author URL: http://somedudeswebsite.me/
Host URL: http://googlesnewthing.com/
Created: 2012-01-01 09:41
EOS

h = Hash[s.each_line.map { |l| l.chomp.split(': ', 2) }]
p h

Output:

输出:

{"Title"=>"My Treatise on Kumquats", "Author"=>"Joe Schmoe", "Author URL"=>"http://somedudeswebsite.me/", "Host URL"=>"http://googlesnewthing.com/", "Created"=>"2012-01-01 09:41"}

#3


1  

You can use regex to parse your text:

您可以使用regex解析您的文本:

str = "Title: My Treatise on Kumquats
Author: Joe Schmoe
Author URL: http://somedudeswebsite.me/
Host URL: http://googlesnewthing.com/
Created: 2012-01-01 09:41"

matches = str.scan /^(.+?): (.+?)$/m

matches.each { |m|
   key = m[0]
   value = m[1]
}

This is multi-line regex (/<regex>/m) - it will match each line into two groups (with indexes 0 and 1). First group will contain all characters before the first occurence of ": " (colon + space). Second group will contain all the rest characters in this line (until regex encounter end of line $).

这是多行regex (/ /m)——它将每行匹配为两组(索引为0和1)。第二组将包含这一行中的所有其余字符(直到regex遇到第$行末尾)。

This is how you can convert result into Hash:

这就是如何将结果转换为散列的:

dictionary = matches.inject({}) do |dict, m| 
  dict[m[0]] = m[1]
  dict
end

UPDATE

更新

As Michael Kohl mentioned, it is possible to write this in one line:

正如迈克尔·科尔(Michael Kohl)所提到的,可以用一句话来描述:

hash = Hash[str.scan /^(.+?): (.+?)$/m]

#4


0  

You can simply do this by

你可以通过

>> s = 'Author URL: http://somedudeswebsite.me/'
>> first_idx = s.index(':')
>> key,value = s[0..first_idx-1],s[first_idx+1..s.length]
=> ["Author URL", " http://somedudeswebsite.me/"]

or to key value hash by

或键值哈希by

>> kv = Hash[*s[0..first_idx-1],s[first_idx+1..s.length]]
=> {"Author URL"=>" http://somedudeswebsite.me/"}

Hope this helps

希望这有助于

#5


0  

Is line.split(':',2) what you want?

你想要什么?

String#split accepts a second argument which specifies parts to be splited. it works in ruby 1.9.3, not sure in earlier versions. (but i'm almost sure it also works in 1.9.2 too)

String#split接受第二个参数,该参数指定要分割的部分。它在ruby 1.9.3中工作,在早期版本中不确定。(但我几乎可以肯定的是,在1.9.2版本中它也同样适用)

If this is not available, line.scan(%r{^([^:]*):(.*)}) should also work.

如果这不是可用,line.scan r(% { ^(^:*):(. *)})也应该工作。