Ruby:替换从MS Word粘贴的项目符号

时间:2022-07-30 20:23:39

I need to remove bullet symbols from text pasted from MS Word, but I can't figure out what to match on.

我需要从MS Word粘贴的文本中删除子弹符号,但我无法弄清楚要匹配的内容。

When printed to STDOUT the symbol displays as ⢠The ascii codes for these characters are 194 and 162. The ascii code for • symbols is 149

当打印到STDOUT时,符号显示为¢这些字符的ascii代码为194和162.•符号的ascii代码为149

Any suggestions how to proceed either in JavaScript or Ruby code?

有关如何在JavaScript或Ruby代码中进行的任何建议吗?

2 个解决方案

#1


In ruby, you should be able to use something like:

在ruby中,你应该能够使用如下内容:

mystring.gsub(/[\xxx]/,'whatever')

where xxx is the character code you are looking for. You can see what that code is by just doing a puts mystring in irb and it should show you

其中xxx是您要查找的字符代码。您可以通过在irb中执行puts mystring来查看代码是什么,它应该向您显示

#2


I have had a similar problem with the bullet points, including getting the symbols you describe. I tried a variety of regEx filters and couldn't get anything to work, either on the bullet point or those resulting â ¢ characters.

我的要点有类似的问题,包括获得你描述的符号。我尝试了各种regEx过滤器,无论是在项目符号点还是由此产生的字符上都无法正常工作。

However, I did manage to find a way to filter the bullet point (or any similar character) using a custom method. It's not pretty or ideal, but it works:

但是,我确实找到了一种使用自定义方法过滤项目符号(或任何类似字符)的方法。它不漂亮或不理想,但它有效:

def strip_bullet_point(value) 
  first_char = 0
  value.each_char { |c| c =~ /[A-Za-z]/ ? break : first_char += 1 }

  value[first_char...value.length]
end

This will also remove all preceding blanks and other non alphabet characters, since they also return nil for the =~ check.

这也将删除所有前面的空格和其他非字母字符,因为它们也为=〜检查返回nil。

Do not use /[[:alpha:]]/ for the expression match, as that will consider the â ¢ characters as letters. Just note that /[A-Za-z]/ will give false negatives for non-English characters, such as 'ñ'.

不要将/ [[:alpha:]] /用于表达式匹配,因为它会将字符视为字母。请注意,/ [A-Za-z] /将对非英文字符给出假阴性,例如'ñ'。

#1


In ruby, you should be able to use something like:

在ruby中,你应该能够使用如下内容:

mystring.gsub(/[\xxx]/,'whatever')

where xxx is the character code you are looking for. You can see what that code is by just doing a puts mystring in irb and it should show you

其中xxx是您要查找的字符代码。您可以通过在irb中执行puts mystring来查看代码是什么,它应该向您显示

#2


I have had a similar problem with the bullet points, including getting the symbols you describe. I tried a variety of regEx filters and couldn't get anything to work, either on the bullet point or those resulting â ¢ characters.

我的要点有类似的问题,包括获得你描述的符号。我尝试了各种regEx过滤器,无论是在项目符号点还是由此产生的字符上都无法正常工作。

However, I did manage to find a way to filter the bullet point (or any similar character) using a custom method. It's not pretty or ideal, but it works:

但是,我确实找到了一种使用自定义方法过滤项目符号(或任何类似字符)的方法。它不漂亮或不理想,但它有效:

def strip_bullet_point(value) 
  first_char = 0
  value.each_char { |c| c =~ /[A-Za-z]/ ? break : first_char += 1 }

  value[first_char...value.length]
end

This will also remove all preceding blanks and other non alphabet characters, since they also return nil for the =~ check.

这也将删除所有前面的空格和其他非字母字符,因为它们也为=〜检查返回nil。

Do not use /[[:alpha:]]/ for the expression match, as that will consider the â ¢ characters as letters. Just note that /[A-Za-z]/ will give false negatives for non-English characters, such as 'ñ'.

不要将/ [[:alpha:]] /用于表达式匹配,因为它会将字符视为字母。请注意,/ [A-Za-z] /将对非英文字符给出假阴性,例如'ñ'。