先从简单的类型入手,举例说明:
>>> import copy
>>> inta = 10
>>> intb = copy.copy(inta)
>>> id(inta),id(intb)
(84312108, 84312108)//相同地址,相同内容
>>> intb is inta
True
>>> intb += 1
>>> intb,inta
(11, 10)
>>> id(inta),id(intb)
(84312108, 84312096)//原始对象inta地址不变,内容不变,拷贝对象intb地址改变,内容改变
import copy之后,函数copy.copy()表示浅拷贝,函数copy.deepcopy()表示深拷贝。
上述过程,可表示为
只不过,后来intb指向了新的内容11
另外,关于浅拷贝,英文原文解释是
A shallow copy of an object is defined to be a newly created object
of the same type as the original object whose contents are
references to the elements in the original object. In other words,
the copied object itself is new, but the contents are not. Shallow
copies of sequence objects are the default type of copy and can
be made in any number of ways:
(1) taking a complete slice [:],
(2) using a factory function,
e.g., list(), dict(), etc.,
(3) using the copy() function of the copy module.
简言之,浅拷贝的对象是新的,内容不是新的,新对象仍然指向原始对象的内容。正如intb指向inta的内容,但intb是新建的。
在C/C++中,程序以变量为先,变量从定义开始,其地址已经分配且可重复赋值,变量本身地址不变。在Python中,我感觉是以数据为先,当尝试修改一些不可变的数据类型时,将自动产生新的引用指向修改后的数据,不会对原始数据产生影响。
python不可变对象主要有字符串,数字,元组[只包含不可变对象的元组]。前述代码中,尝试修改原始对象的内容,intb的地址自动改变并指向新的修改后的内容。
介绍深拷贝之前,先看下面代码
>>> lista = ['la',10000,['lla',99999]]
>>> listb = lista;listc = copy.copy(lista);listd = copy.deepcopy(lista)
>>> lista;listb;listc;listd;[id(x) for x in lista,listb,listc,listd];[id(x) for x in lista];[id(x) for x in listb];[id(x) for x in listc];[id(x) for x in listd]
['la', 10000, ['lla', 99999]]
['la', 10000, ['lla', 99999]]
['la', 10000, ['lla', 99999]]
['la', 10000, ['lla', 99999]]
[89793480, 89793480, 89662088, 89785168]
[89780144, 89858768, 89649024]
[89780144, 89858768, 89649024]
[89780144, 89858768, 89649024]
[89780144, 89858768, 89793240]
可以看到,浅拷贝listc的成员内容地址与lista完全相同,但深拷贝listd的成员内容地址与lista不同。同时注意到,对于lista中的前两项(python中不可变对象类型,字符串和数字),深拷贝与浅拷贝并无差别。
深拷贝的英文原文:
In order to obtain a full or deep copy of the object—creating a
new container but containing references to completely new copies
(references) of the element in the original object—we need to use
the copy.deepcopy() function
也就是说,深拷贝不仅产生新的对象引用还产生新的内容。
举例说明(继续上述代码):
>>> lista[0]='lb'
>>> lista;listb;listc;listd;[id(x) for x in lista,listb,listc,listd];[id(x) for x in lista];[id(x) for x in listb];[id(x) for x in listc];[id(x) for x in listd]
['lb', 10000, ['lla', 99999]]
['lb', 10000, ['lla', 99999]]
['la', 10000, ['lla', 99999]]
['la', 10000, ['lla', 99999]]
[89793480, 89793480, 89662088, 89785168]
[89778512, 89858768, 89649024]
[89778512, 89858768, 89649024]
[89780144, 89858768, 89649024]
[89780144, 89858768, 89793240]
>>> lista[1]=11111
>>> lista;listb;listc;listd;[id(x) for x in lista,listb,listc,listd];[id(x) for x in lista];[id(x) for x in listb];[id(x) for x in listc];[id(x) for x in listd]
['lb', 11111, ['lla', 99999]]
['lb', 11111, ['lla', 99999]]
['la', 10000, ['lla', 99999]]
['la', 10000, ['lla', 99999]]
[89793480, 89793480, 89662088, 89785168]
[89778512, 89858852, 89649024]
[89778512, 89858852, 89649024]
[89780144, 89858768, 89649024]
[89780144, 89858768, 89793240]
lista尝试修改前两项内容,均在自身产生了新的引用指向修改后的对象内容。此时,对浅拷贝listc和深拷贝listd均没有影响。
下面尝试修改lista中可变内容列表
>>> lista[2][0]='llb'
>>> lista;listb;listc;listd;[id(x) for x in lista,listb,listc,listd];[id(x) for x in lista];[id(x) for x in listb];[id(x) for x in listc];[id(x) for x in listd]
['lb', 11111, ['llb', 99999]]
['lb', 11111, ['llb', 99999]]
['la', 10000, ['llb', 99999]]
['la', 10000, ['lla', 99999]]
[89793480, 89793480, 89662088, 89785168]
[89778512, 89858852, 89649024]
[89778512, 89858852, 89649024]
[89780144, 89858768, 89649024]
[89780144, 89858768, 89793240]
>>> lista[2][1]=88888
>>> lista;listb;listc;listd;[id(x) for x in lista,listb,listc,listd];[id(x) for x in lista];[id(x) for x in listb];[id(x) for x in listc];[id(x) for x in listd]
['lb', 11111, ['llb', 88888]]
['lb', 11111, ['llb', 88888]]
['la', 10000, ['llb', 88888]]
['la', 10000, ['lla', 99999]]
[89793480, 89793480, 89662088, 89785168]
[89778512, 89858852, 89649024]
[89778512, 89858852, 89649024]
[89780144, 89858768, 89649024]
[89780144, 89858768, 89793240]
可以看出,lista中list内容的更新会直接影响到浅拷贝listc,但无法影响深拷贝listd。根本原因在于,这两个在内存中已经完全是两块不同的存储,所以深拷贝不会跟着变化。
最后,对于不可变数据类型(比如元组)使用深拷贝,效果等同于浅拷贝。因为既然内容本身不可变,已经没必要再复制内容了。尝试改变内容时,将直接产生新的对象引用和新的数据内容
deep copies of tuples are not made if they contain only atomic
objects. If we changed the banking information to a tuple, we
would get only a shallow copy even though we asked for a deep
copy
在Python中,list类型本身内容就是可变的,可以看到lista,listb,listc,listd这四个对象的地址自始至终都没有变化。而在本文的开始,intb改变了内容,其地址立马发生变化。
题外话:等号复制不同于浅拷贝和深拷贝,相当于建立了一个对象的别名,这就跟C/C++的引用比较像了
>>> listb is lista;listc is lista;listd is lista
True
False
False