Python的运行机制--pyc文件格式浅析

时间:2022-02-12 17:07:17

Python的原始代码在运行前都会被先编译成字节码,并把编译的结果保存到一个一个的PyCodeObject中,pyc 文件即是把PyCodeObject从内存中以marshal格式保存到文件后的结果。


下面我们来通过测试和工具来了解下pyc文件到底有些什么东西。


先写个简单的测试程序:

test.py

import dis
myglobal = True

def add(a):
b = 1
a += b
return a

class world:
def __init__(self):
pass
def sayHello(self):
print 'hello,world'

w = world()
w.sayHello()



在这个例子里,全局变量,函数,类都有了,然后我们用下面的命令把它编译成pyc文件:

python -m compileall test.py


然后我们通过下面的代码分析一下test.pyc:

showfile.py

import dis, marshal, struct, sys, time, types

def show_file(fname):
f = open(fname, "rb")
magic = f.read(4)
moddate = f.read(4)
modtime = time.asctime(time.localtime(struct.unpack('L', moddate)[0]))
print "magic %s" % (magic.encode('hex'))
print "moddate %s (%s)" % (moddate.encode('hex'), modtime)
code = marshal.load(f)
show_code(code)

def show_code(code, indent=''):
old_indent = indent
print "%s<code>" % indent
indent += ' '
print "%s<argcount> %d </argcount>" % (indent, code.co_argcount)
print "%s<nlocals> %d</nlocals>" % (indent, code.co_nlocals)
print "%s<stacksize> %d</stacksize>" % (indent, code.co_stacksize)
print "%s<flags> %04x</flags>" % (indent, code.co_flags)
show_hex("code", code.co_code, indent=indent)
print "%s<dis>" % indent
dis.disassemble(code)
print "%s</dis>" % indent

print "%s<names> %r</names>" % (indent, code.co_names)
print "%s<varnames> %r</varnames>" % (indent, code.co_varnames)
print "%s<freevars> %r</freevars>" % (indent, code.co_freevars)
print "%s<cellvars> %r</cellvars>" % (indent, code.co_cellvars)
print "%s<filename> %r</filename>" % (indent, code.co_filename)
print "%s<name> %r</name>" % (indent, code.co_name)
print "%s<firstlineno> %d</firstlineno>" % (indent, code.co_firstlineno)

print "%s<consts>" % indent
for const in code.co_consts:
if type(const) == types.CodeType:
show_code(const, indent+' ')
else:
print " %s%r" % (indent, const)
print "%s</consts>" % indent

show_hex("lnotab", code.co_lnotab, indent=indent)
print "%s</code>" % old_indent

def show_hex(label, h, indent):
h = h.encode('hex')
if len(h) < 60:
print "%s<%s> %s</%s>" % (indent, label, h,label)
else:
print "%s<%s>" % (indent, label)
for i in range(0, len(h), 60):
print "%s %s" % (indent, h[i:i+60])
print "%s</%s>" % (indent, label)

show_file(sys.argv[1])


运行下面的命令:

showfile.py test.pyc >test.xml


 将会把输入保存到test.xml中,来看看test.xml中有什么内容:

magic 03f30d0a
moddate a2567054 (Sat Nov 22 17:25:54 2014)
<code>
<argcount> 0 </argcount>
<nlocals> 0</nlocals>
<stacksize> 3</stacksize>
<flags> 0040</flags>
<code>
6400006401006c00005a00006501005a02006402008400005a0300640300
640500640400840000830000595a04006504008300005a05006505006a06
008300000164010053
</code>
<dis>
1 0 LOAD_CONST 0 (-1)
3 LOAD_CONST 1 (None)
6 IMPORT_NAME 0 (dis)
9 STORE_NAME 0 (dis)

2 12 LOAD_NAME 1 (True)
15 STORE_NAME 2 (myglobal)

4 18 LOAD_CONST 2 (<code object add at 024E3B60, file "test.py", line 4>)
21 MAKE_FUNCTION 0
24 STORE_NAME 3 (add)

9 27 LOAD_CONST 3 ('world')
30 LOAD_CONST 5 (())
33 LOAD_CONST 4 (<code object world at 024E3650, file "test.py", line 9>)
36 MAKE_FUNCTION 0
39 CALL_FUNCTION 0
42 BUILD_CLASS
43 STORE_NAME 4 (world)

15 46 LOAD_NAME 4 (world)
49 CALL_FUNCTION 0
52 STORE_NAME 5 (w)

16 55 LOAD_NAME 5 (w)
58 LOAD_ATTR 6 (sayHello)
61 CALL_FUNCTION 0
64 POP_TOP
65 LOAD_CONST 1 (None)
68 RETURN_VALUE
</dis>
<names> ('dis', 'True', 'myglobal', 'add', 'world', 'w', 'sayHello')</names>
<varnames> ()</varnames>
<freevars> ()</freevars>
<cellvars> ()</cellvars>
<filename> 'test.py'</filename>
<name> '<module>'</name>
<firstlineno> 1</firstlineno>
<consts>
-1
None
<code>
<argcount> 1 </argcount>
<nlocals> 2</nlocals>
<stacksize> 2</stacksize>
<flags> 0043</flags>
<code> 6401007d01007c00007c0100377d00007c000053</code>
<dis>
5 0 LOAD_CONST 1 (1)
3 STORE_FAST 1 (b)

6 6 LOAD_FAST 0 (a)
9 LOAD_FAST 1 (b)
12 INPLACE_ADD
13 STORE_FAST 0 (a)

7 16 LOAD_FAST 0 (a)
19 RETURN_VALUE
</dis>
<names> ()</names>
<varnames> ('a', 'b')</varnames>
<freevars> ()</freevars>
<cellvars> ()</cellvars>
<filename> 'test.py'</filename>
<name> 'add'</name>
<firstlineno> 4</firstlineno>
<consts>
None
1
</consts>
<lnotab> 000106010a01</lnotab>
</code>
'world'
<code>
<argcount> 0 </argcount>
<nlocals> 0</nlocals>
<stacksize> 1</stacksize>
<flags> 0042</flags>
<code> 6500005a01006400008400005a02006401008400005a03005253</code>
<dis>
9 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)

10 6 LOAD_CONST 0 (<code object __init__ at 024E3B18, file "test.py", line 10>)
9 MAKE_FUNCTION 0
12 STORE_NAME 2 (__init__)

12 15 LOAD_CONST 1 (<code object sayHello at 024E3698, file "test.py", line 12>)
18 MAKE_FUNCTION 0
21 STORE_NAME 3 (sayHello)
24 LOAD_LOCALS
25 RETURN_VALUE
</dis>
<names> ('__name__', '__module__', '__init__', 'sayHello')</names>
<varnames> ()</varnames>
<freevars> ()</freevars>
<cellvars> ()</cellvars>
<filename> 'test.py'</filename>
<name> 'world'</name>
<firstlineno> 9</firstlineno>
<consts>
<code>
<argcount> 1 </argcount>
<nlocals> 1</nlocals>
<stacksize> 1</stacksize>
<flags> 0043</flags>
<code> 64000053</code>
<dis>
11 0 LOAD_CONST 0 (None)
3 RETURN_VALUE
</dis>
<names> ()</names>
<varnames> ('self',)</varnames>
<freevars> ()</freevars>
<cellvars> ()</cellvars>
<filename> 'test.py'</filename>
<name> '__init__'</name>
<firstlineno> 10</firstlineno>
<consts>
None
</consts>
<lnotab> 0001</lnotab>
</code>
<code>
<argcount> 1 </argcount>
<nlocals> 1</nlocals>
<stacksize> 1</stacksize>
<flags> 0043</flags>
<code> 640100474864000053</code>
<dis>
13 0 LOAD_CONST 1 ('hello,world')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
</dis>
<names> ()</names>
<varnames> ('self',)</varnames>
<freevars> ()</freevars>
<cellvars> ()</cellvars>
<filename> 'test.py'</filename>
<name> 'sayHello'</name>
<firstlineno> 12</firstlineno>
<consts>
None
'hello,world'
</consts>
<lnotab> 0001</lnotab>
</code>
</consts>
<lnotab> 06010902</lnotab>
</code>
()
</consts>
<lnotab> 0c010602090513060901</lnotab>
</code>


再对照Python代码中Code.h中定义的PyCodeObject结构:

/* Bytecode object */
typedef struct {
PyObject_HEAD
int co_argcount; /* #arguments, except *args */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
PyObject *co_code; /* instruction opcodes */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
/* The rest doesn't count for hash/cmp */
PyObject *co_filename; /* string (where it was loaded from) */
PyObject *co_name; /* string (name, for reference) */
int co_firstlineno; /* first source line number */
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
void *co_zombieframe; /* for optimization only (see frameobject.c) */
PyObject *co_weakreflist; /* to support weakrefs to code objects */
} PyCodeObject;


我们可以看到,整个test.pyc就是一个嵌套的PyCodeObject结构的组合,对于每个函数,或者类的方法,都会生成一个对应的PyCodeObject结构,并且模块还会生成额外的一个PyCodeObject结构:

来简单解释下各字段的意义:

argcount:参数的个数
nlocals:局部变量的个数(包含参数在内)
stacksize:堆栈的大小
flags:用来表示参数中是否有*args或者 **kwargs
code:字节码
names:全局变量,函数,类,类的方法的名称
varnames:局部变量的名称(包含参数)
consts:一个常量表,在marshal.c中有定义所有的类型:

#define TYPE_NULL               '0'
#define TYPE_NONE 'N'
#define TYPE_FALSE 'F'
#define TYPE_TRUE 'T'
#define TYPE_STOPITER 'S'
#define TYPE_ELLIPSIS '.'
#define TYPE_INT 'i'
#define TYPE_INT64 'I'
#define TYPE_FLOAT 'f'
#define TYPE_BINARY_FLOAT 'g'
#define TYPE_COMPLEX 'x'
#define TYPE_BINARY_COMPLEX 'y'
#define TYPE_LONG 'l'
#define TYPE_STRING 's'
#define TYPE_INTERNED 't'
#define TYPE_STRINGREF 'R'
#define TYPE_TUPLE '('
#define TYPE_LIST '['
#define TYPE_DICT '{'
#define TYPE_CODE 'c'
#define TYPE_UNICODE 'u'
#define TYPE_UNKNOWN '?'
#define TYPE_SET '<'
#define TYPE_FROZENSET '>'


所有的PyCodeObject都是通过调用以下的函数得以运行的:

PyObject * PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)

这个函数是Python的一个重量极的函数,它的作用即是执行中间码,Python的代码都是通过调用这个函数来运行的。


通过上面的分析,对于Python代码的运行应该有一个比较清醒的认识。


参考链接:

http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html

http://daeken.com/2010-02-20_Python_Marshal_Format.html