How to read Ctrl command characters from a file in Python

Right now I am trying to read and parse a file using Python 2. The creator of the file typed a bunch of lines in the terminal, with (Ctrl A)s within each line, and copied those lines into a text file. So the lines in the file look like "(something)^A(something)". When I use the readlines() function in python to read the file, those "^A" strings cannot be recognized.

现在我正在尝试使用Python 2读取和解析文件。文件的创建者在终端中键入了一串行,每行中都有(Ctrl A)s,并将这些行复制到文本文件中。所以文件中的行看起来像“(某事)^ A(某事)”。当我在python中使用readlines()函数来读取文件时,无法识别那些“^ A”字符串。

I tried to use io.open and codecs.open and set the encoding as UTF-8, but "^A" is clearly not an UTF-8 string. Does anyone know how to read these special control command strings from a file using python? Thank you very much!

我尝试使用io.open和codecs.open并将编码设置为UTF-8,但“^ A”显然不是UTF-8字符串。有谁知道如何使用python从文件中读取这些特殊的控制命令字符串?非常感谢你!

2 个解决方案

#1

Simply read the file in binary mode like so: open('file.txt', 'rb'). Ctrl-A will be the value 1.

只需像二进制模式一样读取文件:open('file.txt','rb')。 Ctrl-A将是值1。

with open('test.txt', 'rb') as f:
    text = f.read()
    for char in text:
        if char == b'\x01': # \x01 stands for the byte with hex value 01
            # Do something
            pass
        else:
            # Do something else
            pass

#2

These control characters are part of the ASCII character set, with numeric codes ranging from 0 to 31 (or 00 to 1F in hexadecimals). To strip them out from a string, simply use regex substitution:

这些控制字符是ASCII字符集的一部分,数字代码范围为0到31(或十六进制为00到1F)。要从字符串中删除它们,只需使用正则表达式替换:

import re
clean_string = re.sub(r'[\x00-\x1f]+', '', string_with_control_characters)

#1