删除Ruby字符串中除字母和数字之外的所有字符

时间:2022-08-22 13:12:47

I have a string input field in a form. I get that value in params hash. How should I remove all characters except alphabets and numbers from that string.

我在表单中有一个字符串输入字段。我在params hash中得到了这个值。如何从该字符串中删除除字母和数字之外的所有字符。

3 个解决方案

#1


52  

Just to remind people of good 'ol tr:

只是为了提醒人们好的'ol tr:

asdf.tr('^A-Za-z0-9', '')

which is finding the complement of the character ranges and translating the characters to ''.

这是找到字符范围的补充并将字符转换为''。

I was curious whether using a \W character class was faster than ranges and gsub vs. tr:

我很好奇是否使用\ W字符类比范围和gsub与tr更快:

require 'benchmark'

asdf = [('A'..'z').to_a, ('0'..'9').to_a].join

puts asdf
puts asdf.tr(   '^A-Za-z0-9',    '' )
puts asdf.gsub( /[\W_]+/,        '' )
puts asdf.gsub( /\W+/,           '' )
puts asdf.gsub( /\W/,            '' )
puts asdf.gsub( /[^A-Za-z0-9]+/, '' )
puts asdf.scan(/[a-z\d]/i).join

n = 100_000
Benchmark.bm(7) do |x|
  x.report("tr:")    { n.times do; asdf.tr('^A-Za-z0-9', '');      end }
  x.report("gsub1:") { n.times do; asdf.gsub(/[\W_]+/, '');        end }
  x.report("gsub2:") { n.times do; asdf.gsub(/\W+/, '');           end }
  x.report("gsub3:") { n.times do; asdf.gsub(/\W/, '');            end }
  x.report("gsub4:") { n.times do; asdf.gsub(/[^A-Za-z0-9]+/, ''); end }
  x.report("scan:")  { n.times do; asdf.scan(/[a-z\d]/i).join;     end }
end

>> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>>              user     system      total        real
>> tr:      0.560000   0.000000   0.560000 (  0.557883)
>> gsub1:   0.510000   0.000000   0.510000 (  0.513244)
>> gsub2:   0.820000   0.000000   0.820000 (  0.823816)
>> gsub3:   0.960000   0.000000   0.960000 (  0.955848)
>> gsub4:   0.900000   0.000000   0.900000 (  0.902166)
>> scan:    5.630000   0.010000   5.640000 (  5.630990)

You can see a couple of the patterns aren't catching the '_', which is part of \w, and, as a result not meeting the OP's request.

您可以看到一些模式没有捕获'_',这是\ w的一部分,因此不符合OP的请求。

#2


21  

Without a regular expression:

没有正则表达式:

garbage = 'ab_c<>?AB C!@#123'
puts garbage.delete("^a-zA-Z0-9") #=> abcABC123

In which the '^' negates everything after it.

其中'^'否定了它之后的一切。

#3


7  

=> '^/how/now#(Brown) Cow'.gsub /\W/, '' # or /[\W_]/
=> "hownowBrownCow"

...updated based on the comments...

...根据评论更新...

#1


52  

Just to remind people of good 'ol tr:

只是为了提醒人们好的'ol tr:

asdf.tr('^A-Za-z0-9', '')

which is finding the complement of the character ranges and translating the characters to ''.

这是找到字符范围的补充并将字符转换为''。

I was curious whether using a \W character class was faster than ranges and gsub vs. tr:

我很好奇是否使用\ W字符类比范围和gsub与tr更快:

require 'benchmark'

asdf = [('A'..'z').to_a, ('0'..'9').to_a].join

puts asdf
puts asdf.tr(   '^A-Za-z0-9',    '' )
puts asdf.gsub( /[\W_]+/,        '' )
puts asdf.gsub( /\W+/,           '' )
puts asdf.gsub( /\W/,            '' )
puts asdf.gsub( /[^A-Za-z0-9]+/, '' )
puts asdf.scan(/[a-z\d]/i).join

n = 100_000
Benchmark.bm(7) do |x|
  x.report("tr:")    { n.times do; asdf.tr('^A-Za-z0-9', '');      end }
  x.report("gsub1:") { n.times do; asdf.gsub(/[\W_]+/, '');        end }
  x.report("gsub2:") { n.times do; asdf.gsub(/\W+/, '');           end }
  x.report("gsub3:") { n.times do; asdf.gsub(/\W/, '');            end }
  x.report("gsub4:") { n.times do; asdf.gsub(/[^A-Za-z0-9]+/, ''); end }
  x.report("scan:")  { n.times do; asdf.scan(/[a-z\d]/i).join;     end }
end

>> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
>>              user     system      total        real
>> tr:      0.560000   0.000000   0.560000 (  0.557883)
>> gsub1:   0.510000   0.000000   0.510000 (  0.513244)
>> gsub2:   0.820000   0.000000   0.820000 (  0.823816)
>> gsub3:   0.960000   0.000000   0.960000 (  0.955848)
>> gsub4:   0.900000   0.000000   0.900000 (  0.902166)
>> scan:    5.630000   0.010000   5.640000 (  5.630990)

You can see a couple of the patterns aren't catching the '_', which is part of \w, and, as a result not meeting the OP's request.

您可以看到一些模式没有捕获'_',这是\ w的一部分,因此不符合OP的请求。

#2


21  

Without a regular expression:

没有正则表达式:

garbage = 'ab_c<>?AB C!@#123'
puts garbage.delete("^a-zA-Z0-9") #=> abcABC123

In which the '^' negates everything after it.

其中'^'否定了它之后的一切。

#3


7  

=> '^/how/now#(Brown) Cow'.gsub /\W/, '' # or /[\W_]/
=> "hownowBrownCow"

...updated based on the comments...

...根据评论更新...