I need to remove bullet symbols from text pasted from MS Word, but I can't figure out what to match on.
我需要从MS Word粘贴的文本中删除子弹符号,但我无法弄清楚要匹配的内容。
When printed to STDOUT the symbol displays as ⢠The ascii codes for these characters are 194 and 162. The ascii code for • symbols is 149
当打印到STDOUT时,符号显示为¢这些字符的ascii代码为194和162.•符号的ascii代码为149
Any suggestions how to proceed either in JavaScript or Ruby code?
有关如何在JavaScript或Ruby代码中进行的任何建议吗?
2 个解决方案
#1
In ruby, you should be able to use something like:
在ruby中,你应该能够使用如下内容:
mystring.gsub(/[\xxx]/,'whatever')
where xxx
is the character code you are looking for. You can see what that code is by just doing a puts mystring
in irb and it should show you
其中xxx是您要查找的字符代码。您可以通过在irb中执行puts mystring来查看代码是什么,它应该向您显示
#2
I have had a similar problem with the bullet points, including getting the symbols you describe. I tried a variety of regEx filters and couldn't get anything to work, either on the bullet point or those resulting â ¢ characters.
我的要点有类似的问题,包括获得你描述的符号。我尝试了各种regEx过滤器,无论是在项目符号点还是由此产生的字符上都无法正常工作。
However, I did manage to find a way to filter the bullet point (or any similar character) using a custom method. It's not pretty or ideal, but it works:
但是,我确实找到了一种使用自定义方法过滤项目符号(或任何类似字符)的方法。它不漂亮或不理想,但它有效:
def strip_bullet_point(value)
first_char = 0
value.each_char { |c| c =~ /[A-Za-z]/ ? break : first_char += 1 }
value[first_char...value.length]
end
This will also remove all preceding blanks and other non alphabet characters, since they also return nil for the =~ check.
这也将删除所有前面的空格和其他非字母字符,因为它们也为=〜检查返回nil。
Do not use /[[:alpha:]]/
for the expression match, as that will consider the â ¢ characters as letters. Just note that /[A-Za-z]/
will give false negatives for non-English characters, such as 'ñ'.
不要将/ [[:alpha:]] /用于表达式匹配,因为它会将字符视为字母。请注意,/ [A-Za-z] /将对非英文字符给出假阴性,例如'ñ'。
#1
In ruby, you should be able to use something like:
在ruby中,你应该能够使用如下内容:
mystring.gsub(/[\xxx]/,'whatever')
where xxx
is the character code you are looking for. You can see what that code is by just doing a puts mystring
in irb and it should show you
其中xxx是您要查找的字符代码。您可以通过在irb中执行puts mystring来查看代码是什么,它应该向您显示
#2
I have had a similar problem with the bullet points, including getting the symbols you describe. I tried a variety of regEx filters and couldn't get anything to work, either on the bullet point or those resulting â ¢ characters.
我的要点有类似的问题,包括获得你描述的符号。我尝试了各种regEx过滤器,无论是在项目符号点还是由此产生的字符上都无法正常工作。
However, I did manage to find a way to filter the bullet point (or any similar character) using a custom method. It's not pretty or ideal, but it works:
但是,我确实找到了一种使用自定义方法过滤项目符号(或任何类似字符)的方法。它不漂亮或不理想,但它有效:
def strip_bullet_point(value)
first_char = 0
value.each_char { |c| c =~ /[A-Za-z]/ ? break : first_char += 1 }
value[first_char...value.length]
end
This will also remove all preceding blanks and other non alphabet characters, since they also return nil for the =~ check.
这也将删除所有前面的空格和其他非字母字符,因为它们也为=〜检查返回nil。
Do not use /[[:alpha:]]/
for the expression match, as that will consider the â ¢ characters as letters. Just note that /[A-Za-z]/
will give false negatives for non-English characters, such as 'ñ'.
不要将/ [[:alpha:]] /用于表达式匹配,因为它会将字符视为字母。请注意,/ [A-Za-z] /将对非英文字符给出假阴性,例如'ñ'。