I'm writing a Python program that logs terminal interaction (similar to the script program), and I'd like to filter out the VT100 escape sequences before writing to disk. I'd like to use a function like this:
我正在编写一个记录终端交互(类似于脚本程序)的Python程序,我想在写入磁盘之前过滤掉VT100转义序列。我想用这样的函数:
def strip_escapes(buf):
escape_regex = re.compile(???) # <--- this is what I'm looking for
return escape_regex.sub('', buf)
What should go in escape_regex
?
在escape_regex中应该做什么?
3 个解决方案
#1
4
The combined expression for escape sequences can be something generic like this:
转义序列的组合表达式可以是这样的:
(\x1b\[|\x9b)[^@-_]*[@-_]|\x1b[@-_]
Should be used with re.I
应该与rei一起使用
This incorporates:
这包括:
- Two-byte sequences, i.e.
\x1b
followed by a character in the range of@
until_
. - 双字节序列,即\x1b,后面跟着一个字符,范围为@,直到_。
- One-byte CSI, i.e.
\x9b
as opposed to\x1b + "["
. - 一个字节的CSI,也就是\x9b,而不是\x1b + "["。
However, this will not work for sequences that define key mappings or otherwise included strings wrapped in quotes.
但是,对于定义键映射或包含引号括起来的字符串的序列,这是行不通的。
#2
2
VT100 codes are already grouped(mostly) according to similar patterns here:
VT100代码已经(大部分)按照类似的模式分组:
http://ascii-table.com/ansi-escape-sequences-vt-100.php
http://ascii table.com/ansi -逃避-序列- vt - 100. - php
I think the simplest approach would be to use some tool like regexbuddy to define a regex for each VT100 codes group.
我认为最简单的方法是使用regexbuddy之类的工具为每个VT100代码组定义一个regex。
#3
1
I found the following solution to successfully parse vt100 color codes and remove the non-printable escape sequences. The code snippet found here successfully removed all codes for me when running a telnet session using telnetlib:
我找到了以下解决方案来成功解析vt100颜色代码并删除不可打印的转义序列。在使用telnetlib运行telnet会话时,这里找到的代码片段成功地为我删除了所有代码:
def __processReadLine(self, line_p):
'''
remove non-printable characters from line <line_p>
return a printable string.
'''
line, i, imax = '', 0, len(line_p)
while i < imax:
ac = ord(line_p[i])
if (32<=ac<127) or ac in (9,10): # printable, \t, \n
line += line_p[i]
elif ac == 27: # remove coded sequences
i += 1
while i<imax and line_p[i].lower() not in 'abcdhsujkm':
i += 1
elif ac == 8 or (ac==13 and line and line[-1] == ' '): # backspace or EOL spacing
if line:
line = line[:-1]
i += 1
return line
#1
4
The combined expression for escape sequences can be something generic like this:
转义序列的组合表达式可以是这样的:
(\x1b\[|\x9b)[^@-_]*[@-_]|\x1b[@-_]
Should be used with re.I
应该与rei一起使用
This incorporates:
这包括:
- Two-byte sequences, i.e.
\x1b
followed by a character in the range of@
until_
. - 双字节序列,即\x1b,后面跟着一个字符,范围为@,直到_。
- One-byte CSI, i.e.
\x9b
as opposed to\x1b + "["
. - 一个字节的CSI,也就是\x9b,而不是\x1b + "["。
However, this will not work for sequences that define key mappings or otherwise included strings wrapped in quotes.
但是,对于定义键映射或包含引号括起来的字符串的序列,这是行不通的。
#2
2
VT100 codes are already grouped(mostly) according to similar patterns here:
VT100代码已经(大部分)按照类似的模式分组:
http://ascii-table.com/ansi-escape-sequences-vt-100.php
http://ascii table.com/ansi -逃避-序列- vt - 100. - php
I think the simplest approach would be to use some tool like regexbuddy to define a regex for each VT100 codes group.
我认为最简单的方法是使用regexbuddy之类的工具为每个VT100代码组定义一个regex。
#3
1
I found the following solution to successfully parse vt100 color codes and remove the non-printable escape sequences. The code snippet found here successfully removed all codes for me when running a telnet session using telnetlib:
我找到了以下解决方案来成功解析vt100颜色代码并删除不可打印的转义序列。在使用telnetlib运行telnet会话时,这里找到的代码片段成功地为我删除了所有代码:
def __processReadLine(self, line_p):
'''
remove non-printable characters from line <line_p>
return a printable string.
'''
line, i, imax = '', 0, len(line_p)
while i < imax:
ac = ord(line_p[i])
if (32<=ac<127) or ac in (9,10): # printable, \t, \n
line += line_p[i]
elif ac == 27: # remove coded sequences
i += 1
while i<imax and line_p[i].lower() not in 'abcdhsujkm':
i += 1
elif ac == 8 or (ac==13 and line and line[-1] == ' '): # backspace or EOL spacing
if line:
line = line[:-1]
i += 1
return line