如何检查已经由Python脚本生成的字符串的linux shell编码

时间:2022-09-16 23:45:20

I run a Python script, that generates a string and then execute a shell script using that string. I want to check the encoding of that string using linux shell but without writing that string in file (disk operations runs slowly). Is it possible to check an encoding of string in Linux (Ubuntu) using only RAM? Something like:

我运行一个Python脚本,生成一个字符串,然后使用该字符串执行shell脚本。我想使用linux shell检查该字符串的编码,但不在文件中写入该字符串(磁盘操作运行缓慢)。是否可以仅使用RAM检查Linux(Ubuntu)中的字符串编码?就像是:

check-encoding 'My string with random encoding'

check-encoding'我的字符串随机编码'

Python check encoding script is slow too.

Python检查编码脚本也很慢。

1 个解决方案

#1


Try file utility. You can pass any string as file argument to file by using echo piped to utility with - option (many commands use a hyphen (-) in place of a filename as an argument to indicate when the input should come from stdin rather than a file):

试试文件工具。你可以使用echo piped to utility with - option将任何字符串作为文件参数传递给文件(许多命令使用连字符( - )代替文件名作为参数,以指示输入何时应来自stdin而不是文件) :

:~  $ echo "test" | file -i -
/dev/stdin: text/plain; charset=us-ascii

:~  $ echo "тест" | file -i -
/dev/stdin: text/plain; charset=utf-8

with pipe to sed:

用管道来sed:

:~  $ echo "тест" | file -i - | sed 's/.*charset=\(.*\)/\1/'
utf-8

or to awk (you can mix it of course):

或者awk(当然你可以把它混合):

:~  $ echo "тест" | file -i - | awk '{ print $3 }'
charset=utf-8

also you can use python chardet module. Chardet comes with a command-line script which reports on the encodings of one or more files. Just install it with:

你也可以使用python chardet模块。 Chardet附带一个命令行脚本,用于报告一个或多个文件的编码。只需安装它:

pip install chardet

and use with pipe from echo:

并使用echo中的管道:

:~  $ echo "тест" | chardetect
<stdin>: utf-8 with confidence 0.938125

#1


Try file utility. You can pass any string as file argument to file by using echo piped to utility with - option (many commands use a hyphen (-) in place of a filename as an argument to indicate when the input should come from stdin rather than a file):

试试文件工具。你可以使用echo piped to utility with - option将任何字符串作为文件参数传递给文件(许多命令使用连字符( - )代替文件名作为参数,以指示输入何时应来自stdin而不是文件) :

:~  $ echo "test" | file -i -
/dev/stdin: text/plain; charset=us-ascii

:~  $ echo "тест" | file -i -
/dev/stdin: text/plain; charset=utf-8

with pipe to sed:

用管道来sed:

:~  $ echo "тест" | file -i - | sed 's/.*charset=\(.*\)/\1/'
utf-8

or to awk (you can mix it of course):

或者awk(当然你可以把它混合):

:~  $ echo "тест" | file -i - | awk '{ print $3 }'
charset=utf-8

also you can use python chardet module. Chardet comes with a command-line script which reports on the encodings of one or more files. Just install it with:

你也可以使用python chardet模块。 Chardet附带一个命令行脚本,用于报告一个或多个文件的编码。只需安装它:

pip install chardet

and use with pipe from echo:

并使用echo中的管道:

:~  $ echo "тест" | chardetect
<stdin>: utf-8 with confidence 0.938125