Notepad++,如何使用regex删除所有非ascii字符?

时间:2022-01-30 20:24:49

i searched a lot but no where its written how to remove non ASCII characters from notepad+??

我搜索了很多,但是没有写怎么从notepad+中删除非ASCII字符?

i need to know what command to write in find and replace (with picture would be great)

我需要知道在find和replace中要编写什么命令(使用图片会更好)

  • if i want to make a white-list and bookmark all the ASCII words/lines so non ASCII lines would be unmarked

    如果我想做一个白名单和书签所有的ASCII字/行,这样非ASCII行就不会被标记

  • if the file is quite large and cant select all the ASCII lines and just want to select the lines containing non ASCII characters..

    如果文件很大,不能选择所有的ASCII行,而只想选择包含非ASCII字符的行。

7 个解决方案

#1


182  

This expression will search for non-ascii values:

此表达式将搜索非ascii值:

[^\x00-\x7F]+

Tick off 'Search Mode = Regular expression', and click Find Next.

勾选“搜索模式=正则表达式”,然后单击“查找下一步”。

Source: Regex any ascii character

来源:Regex任何ascii字符

#2


36  

In Notepad++, if you go to:

在记事本++中,如果你去:

Search | Find characters in range | Non-ASCII Characters (128-255)

搜索|查找范围为|的非ascii字符(128-255)

you can then step through the document to each non-ascii character.

然后,您可以对文档中的每个非ascii字符进行逐步处理。

#3


16  

To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+

删除所有非ascii字符,您可以使用下面的替换:[^ \ x00 - \ x7F]+

Notepad++,如何使用regex删除所有非ascii字符?

To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them

为了突出显示字符,我建议在搜索窗口中使用标记函数:这将突出非ascii字符,并在包含其中之一的行中添加书签。

Notepad++,如何使用regex删除所有非ascii字符?

If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.

如果您想要突出显示并在ASCII字符上加上书签,您可以使用regex [\x00-\x7F]来这样做。

Cheers

干杯

#4


14  

In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:

除了ProGM给出的答案之外,如果你看到NUL或ACK之类的盒子里的字符想要删除它们,这些是ASCII控制字符(0到31),你可以用下面的表达式找到它们并删除它们:

[\x00-\x1F]+

In order to remove all non-ascii AND ascii control characters, you should remove all characters matching this regex:

为了删除所有非ascii和ascii控制字符,您应该删除与此regex匹配的所有字符:

[^\x1F-\x7F]+

#5


4  

To keep new lines:

将新行:

  1. First select a character for new line... I used #.
  2. 首先为新行选择一个字符……我用#。
  3. Select replace option, extended.
  4. 选择替代选项,扩展。
  5. input \n replace with #
  6. 输入\n替换为#
  7. Hit Replace All
  8. 点击替换所有

Next:

下一个:

  1. Select Replace option Regular Expression.
  2. 选择Replace选项正则表达式。
  3. Input this : [^\x20-\x7E]+
  4. 输入:[^ \ x20的- \ x7E]+
  5. Keep Replace With Empty
  6. 保持替换为空
  7. Hit Replace All
  8. 点击替换所有

Now, Select Replace option Extended and Replace # with \n

现在,选择“替换”选项并将#替换为\n

:) now, you have a clean ASCII file ;)

现在,您有了一个干净的ASCII文件;

#6


1  

Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.

另一个好方法是在编辑器中进入UTF8模式,这样您就可以看到这些有趣的字符并自己删除它们。

#7


1  

Another way...

另一种方式…

  1. Install the Text FX plugin if you don't have it already
  2. 如果你还没有文本特效插件,安装它
  3. Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
  4. 转到TextFX菜单选项-> zap所有不可打印字符到#。它将用3 #符号替换所有无效字符
  5. Go to Find/Replace and look for ###. Replace it with a space.
  6. 去寻找/替换并寻找##。用空格替换它。

This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.

如果您不记得regex或不愿查找它,这是很好的。但是其他人提到的regex也是一个很好的解决方案。

#1


182  

This expression will search for non-ascii values:

此表达式将搜索非ascii值:

[^\x00-\x7F]+

Tick off 'Search Mode = Regular expression', and click Find Next.

勾选“搜索模式=正则表达式”,然后单击“查找下一步”。

Source: Regex any ascii character

来源:Regex任何ascii字符

#2


36  

In Notepad++, if you go to:

在记事本++中,如果你去:

Search | Find characters in range | Non-ASCII Characters (128-255)

搜索|查找范围为|的非ascii字符(128-255)

you can then step through the document to each non-ascii character.

然后,您可以对文档中的每个非ascii字符进行逐步处理。

#3


16  

To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+

删除所有非ascii字符,您可以使用下面的替换:[^ \ x00 - \ x7F]+

Notepad++,如何使用regex删除所有非ascii字符?

To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them

为了突出显示字符,我建议在搜索窗口中使用标记函数:这将突出非ascii字符,并在包含其中之一的行中添加书签。

Notepad++,如何使用regex删除所有非ascii字符?

If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.

如果您想要突出显示并在ASCII字符上加上书签,您可以使用regex [\x00-\x7F]来这样做。

Cheers

干杯

#4


14  

In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:

除了ProGM给出的答案之外,如果你看到NUL或ACK之类的盒子里的字符想要删除它们,这些是ASCII控制字符(0到31),你可以用下面的表达式找到它们并删除它们:

[\x00-\x1F]+

In order to remove all non-ascii AND ascii control characters, you should remove all characters matching this regex:

为了删除所有非ascii和ascii控制字符,您应该删除与此regex匹配的所有字符:

[^\x1F-\x7F]+

#5


4  

To keep new lines:

将新行:

  1. First select a character for new line... I used #.
  2. 首先为新行选择一个字符……我用#。
  3. Select replace option, extended.
  4. 选择替代选项,扩展。
  5. input \n replace with #
  6. 输入\n替换为#
  7. Hit Replace All
  8. 点击替换所有

Next:

下一个:

  1. Select Replace option Regular Expression.
  2. 选择Replace选项正则表达式。
  3. Input this : [^\x20-\x7E]+
  4. 输入:[^ \ x20的- \ x7E]+
  5. Keep Replace With Empty
  6. 保持替换为空
  7. Hit Replace All
  8. 点击替换所有

Now, Select Replace option Extended and Replace # with \n

现在,选择“替换”选项并将#替换为\n

:) now, you have a clean ASCII file ;)

现在,您有了一个干净的ASCII文件;

#6


1  

Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.

另一个好方法是在编辑器中进入UTF8模式,这样您就可以看到这些有趣的字符并自己删除它们。

#7


1  

Another way...

另一种方式…

  1. Install the Text FX plugin if you don't have it already
  2. 如果你还没有文本特效插件,安装它
  3. Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
  4. 转到TextFX菜单选项-> zap所有不可打印字符到#。它将用3 #符号替换所有无效字符
  5. Go to Find/Replace and look for ###. Replace it with a space.
  6. 去寻找/替换并寻找##。用空格替换它。

This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.

如果您不记得regex或不愿查找它,这是很好的。但是其他人提到的regex也是一个很好的解决方案。