从命令行指定Python源文件编码

时间:2022-09-21 00:12:11

PEP0263 specifies a syntax to declare the encoding of a Python source file within the source file itself.

PEP0263指定一个语法,用于在源文件本身中声明Python源文件的编码。

Is it possible to specify the encoding from the command line?

是否可以从命令行指定编码?

Or is there a reason why this might be undesirable?

或者这有什么不可取的原因吗?

I'm thinking of something like:

我在想:

$ python --encoding utf-8 myscript.py

or even:

甚至:

$ PYTHONSOURCEENCODING=utf-8 python myscript.py

2 个解决方案

#1


3  

This is a hack, and isn't what you're looking for, and it doesn't work on systems that don't have sed, but you can prepend the coding line to any python script by using sed '1s/^/# -*- coding: utf-8 -*-\n/' script.py | python.

这是一个黑客,不是你正在寻找什么,也不系统,没有对话,但你可以预先考虑编码行任何python脚本使用sed的1 s / ^ # - * -编码:utf - 8 - *——\ n /脚本。py | python。

To make this more generalized, you can define a function in your .bashrc or profile.

要使其更一般化,可以在.bashrc或概要文件中定义一个函数。

As an aside, I think the reason that this wasn't implemented in the first place is that encoding is and should be considered a property of each file itself, not the call that spawns the thread. The conceptual spaces in which file encoding and process spawning exist are pretty different, at least to my thinking.

顺便提一下,我认为之所以没有首先实现它,是因为编码是并且应该被认为是每个文件本身的属性,而不是生成线程的调用。文件编码和进程生成的概念空间非常不同,至少在我看来是如此。

#2


1  

Although there could be special use cases where this feature could help, I think it could be confusing.

虽然有一些特殊的用例可以帮助这个特性,但是我认为它可能会令人困惑。

When you execute a Python script, there can be 2 diffent encodings :

当您执行一个Python脚本时,可以有两个不同的编码:

  • the source script encoding, which can be defined in the script itself via PEP0263
  • 源脚本编码,可以通过PEP0263在脚本本身中定义
  • the environment encoding that can be defined through environment variables
  • 可以通过环境变量定义的环境编码

The former is static in the script and its only use is to allow programmer to use non ASCII characters in litteral strings

前者在脚本中是静态的,它的唯一用途是允许程序员在垃圾字符串中使用非ASCII字符

The latter is what should be used for IO. It may change on different runs of the script.

后者应该用于IO。它可能在脚本的不同运行中发生变化。

If you want to pass the script encoding on command line (or through environment variables) you add confusion with the local runtime system encoding.

如果您想在命令行(或通过环境变量)上传递脚本编码,那么您就会增加对本地运行时系统编码的混淆。

#1


3  

This is a hack, and isn't what you're looking for, and it doesn't work on systems that don't have sed, but you can prepend the coding line to any python script by using sed '1s/^/# -*- coding: utf-8 -*-\n/' script.py | python.

这是一个黑客,不是你正在寻找什么,也不系统,没有对话,但你可以预先考虑编码行任何python脚本使用sed的1 s / ^ # - * -编码:utf - 8 - *——\ n /脚本。py | python。

To make this more generalized, you can define a function in your .bashrc or profile.

要使其更一般化,可以在.bashrc或概要文件中定义一个函数。

As an aside, I think the reason that this wasn't implemented in the first place is that encoding is and should be considered a property of each file itself, not the call that spawns the thread. The conceptual spaces in which file encoding and process spawning exist are pretty different, at least to my thinking.

顺便提一下,我认为之所以没有首先实现它,是因为编码是并且应该被认为是每个文件本身的属性,而不是生成线程的调用。文件编码和进程生成的概念空间非常不同,至少在我看来是如此。

#2


1  

Although there could be special use cases where this feature could help, I think it could be confusing.

虽然有一些特殊的用例可以帮助这个特性,但是我认为它可能会令人困惑。

When you execute a Python script, there can be 2 diffent encodings :

当您执行一个Python脚本时,可以有两个不同的编码:

  • the source script encoding, which can be defined in the script itself via PEP0263
  • 源脚本编码,可以通过PEP0263在脚本本身中定义
  • the environment encoding that can be defined through environment variables
  • 可以通过环境变量定义的环境编码

The former is static in the script and its only use is to allow programmer to use non ASCII characters in litteral strings

前者在脚本中是静态的,它的唯一用途是允许程序员在垃圾字符串中使用非ASCII字符

The latter is what should be used for IO. It may change on different runs of the script.

后者应该用于IO。它可能在脚本的不同运行中发生变化。

If you want to pass the script encoding on command line (or through environment variables) you add confusion with the local runtime system encoding.

如果您想在命令行(或通过环境变量)上传递脚本编码,那么您就会增加对本地运行时系统编码的混淆。