本章内容
- 列表、元组
- 字符串
- 字典
- 集合
- 文件
- 字符编码与转码
一、列表、元组
列表是我们最常用的数据类型之一,通过列表可以对数据实现最方便的存储、修改等操作
定义列表
1 names = ['Lyon','wanmin','number']
通过下标访问列表中的元素,下标从0开始计数
1 >>>names[0] 2 'Lyon'
3 >>>names[2] 4 'number'
5 >>>names[-1] 6 'number'
7 >>>names[-2] 8 'wanmin'
切片:取多个元素
1 >>>names["Lyon","Python","IT","age","wanmin"] 2 >>>names[1:4] #取下标1至4之间的值,包括1,不包括4
3 ['Python','IT','age'] 4 >>>names[1:-1] 5 [['Python','IT','age'] 6 >>>names[0:3] 7 ['Lyon','Python','IT'] 8 >>>names[:3] #如果从开头取0可以忽略,跟上句效果一样
9 ['Lyon','Python','IT'] 10 >>>names[3:] #如果想取最后一个,必须不能写-1
11 ['age','wanmin'] 12 >>>names[3:-1] #这样-1就不会被包含了
13 ['age'] 14 >>>names[0::2] #后面的2是代表,每隔一个元素,就取一个
15 ['Lyon','IT','wanmin'] 16 >>>names[::2] #和上局效果一样
17 ['Lyon','IT','wanmin']
追加
>>>names ['Lyon','Python','IT','age','wanmin'] >>>names.append("我是新来的") >>>names ['Lyon','Python','IT','age','wanmin','我是新来的']
插入
>>>names ['Lyon','Python','IT','age','wanmin','我是新来的'] >>>names.insert(2,"强行从IT前面插入") #2表示插入下标为2的值的前面
['Lyon','Python','强行从IT前面插入','IT','age','wanmin','我是新来的']
修改
>>>names ['Lyon','Python','强行从IT前面插入','IT','age','wanmin','我是新来的'] >>>names[2]="我就是修改的"
>>>names ['Lyon','Python','我就是修改的','IT','age','wanmin','我是新来的']
删除
>>>del names[2] #删除下标所在的值
>>>names ['Lyon','Python','IT','age','wanmin','我是新来的'] >>>names.remove("我是新来的") #删除指定元素
>>>names ['Lyon','Python','IT','age','wanmin'] >>>names.pop() #删除列表最后一个值
>>>names
扩展
>>>names ['Lyon','Python','IT','age'] >>>b=[1,2,3] >>>names.extend(b) >>>namse ['Lyon','Python','IT','age'1,2,3]
拷贝
>>>names ['Lyon','Python','IT','age'1,2,3] >>>name_copy = names.copy() #这里的copy只浅copy,现在先不讨论
>>>name_copy ['Lyon','Python','IT','age'1,2,3]
统计
>>>names ['Lyon','Python','IT','age'1,2,3] >>>names.count('IT') 2
排序&翻转
>>>names ['Lyon','Python','IT','age',1,2,3] >>>names.sort() #排序
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: int() < str() #3.0里不同数据类型不能放在一起排序了
>>> names = ['Lyon','Python','IT','age',1,2,3] >>> names ['Lyon', 'Python', 'IT', 'age', 1, 2, 3] >>> names[-3]='1'
>>> names[-2]='2'
>>> names[-1]='3'
>>> names ['Lyon', 'Python', 'IT', 'age', '1', '2', '3'] >>> names.sort() >>> names ['1', '2', '3', 'IT', 'Lyon', 'Python', 'age'] >>>
>>> names.reverse() #翻转
>>> names ['age', 'Python', 'Lyon', 'IT', '3', '2', '1'] >>>
获取下标
>>> names ['age', 'Python', 'Lyon', 'IT', '3', '2', '1'] >>> names.index("Lyon") 2 #只返回找到的第一下标
>>>
元组
元组其实跟列表差不多,也是存一组数,只不过是它一旦创建,就不能再修改,所以又叫只读列表
语法
1 names = ("Lyon","Wanmin","Python")
它只有2个方法,一个是count,一个是index。
二、字符串
特性:不可修改
>>> name = "lyon"
>>> name.capitalize() #首字母大写
'Lyon'
>>> name.casefold() #大写全部变小写
'lyon'
>>> name.center(50,"-") #输出50个字符name居中不够用"-"补上
'-----------------------lyon-----------------------'
>>> name.count("y") #统计“y”出现的次数
1
>>> name.encode() #将字符串编码成bytes格式
b'lyon'
>>> name.endswith("n") #判断是否以“n”结尾
True >>> name.endswith("L") False >>> "Lyo\tn".expandtabs(10) #将\t转成相等个数的空格
'Lyo n'
>>> name.find("L") #寻找“L”,找不到就返回-1,找到返回索引
-1
>>> name.find("l") 0 format: >>> name = ("my name is {},and age is {}") >>> name.format("Lyon",21) 'my name is Lyon,and age is 21'
>>> name = ("my name is {1},and age is {0}") >>> name.format("Lyon",21) 'my name is 21,and age is Lyon'
>>> name = ("my name is {name},and age is {age}") >>> name.format(age=21,name="Lyon") 'my name is Lyon,and age is 21'
>>> name.index("y") 1
>>> '9aA'.isalnum() True >>> '9'.isdigit() #是否是整数
True name.isnumeric name.isprintable name.isspace name.istitle name.isupper >>> "|".join(['alex','jack','rain']) 'alex|jack|rain' maketrans >>> intab = "aeiou" #This is the string having actual characters.
>>> outtab = "12345" #This is the string having corresponding mapping character
>>> trantab = str.maketrans(intab, outtab) >>>
>>> str = "this is string example....wow!!!"
>>> str.translate(trantab) 'th3s 3s str3ng 2x1mpl2....w4w!!!' msg.partition('is') 输出 ('my name ', 'is', ' {name}, and age is {age}') >>> "alex li, chinese name is lijie".replace("li","LI",1) 'alex LI, chinese name is lijie' msg.swapcase 大小写互换 >>> msg.zfill(40) '00000my name is {name}, and age is {age}'
>>> n4.ljust(40,"-") 'Hello 2orld-----------------------------'
>>> n4.rjust(40,"-") '-----------------------------Hello 2orld'
>>> b="ddefdsdff_哈哈"
>>> b.isidentifier() #检测一段字符串可否被当作标志符,即是否符合变量命名规则
True
三、字典
>>> info = { ... 'one':"Lyon", ... 'two':"Wanmin", ... 'three':"IT", ... } >>> info['four'] ="age"
>>>
>>> info {'two': 'Wanmin', 'one': 'Lyon', 'four': 'age', 'three': 'IT'} >>> info['three']="Job"
>>> info {'two': 'Wanmin', 'one': 'Lyon', 'four': 'age', 'three': 'Job'} >>> info.pop("three") 'Job'
>>> info {'two': 'Wanmin', 'one': 'Lyon', 'four': 'age'} >>> del info["four"] >>> info {'two': 'Wanmin', 'one': 'Lyon'} >>>
>>> info.popitem() ('two', 'Wanmin') >>> info {'one': 'Lyon'} >>> "one" in info True >>> info.get("one") 'Lyon'
>>> info1 = { ... 'one':"Lyon", ... 'two':"Wanmin", ... 'three':"IT", ... 'four':"age", ... 'five':"sex", ... } >>> info.get("six") >>> info1["six"] #不存在就会报错 get不会
Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'six'
>>>
>>> info1.keys() dict_keys(['two', 'one', 'four', 'five', 'three']) >>> info1.values() dict_values(['Wanmin', 'Lyon', 'age', 'sex', 'IT']) >>> info2={1:2,2:3,3:4} >>>
>>> info1.update(info2) >>> info1 {1: 2, 2: 3, 3: 4, 'two': 'Wanmin', 'four': 'age', 'five': 'sex', 'one': 'Lyon', 'three': 'IT'} #循环的问题
for key in info: print(key,info[key]) for k,v in info.items(): #会先把dict转换成list,数据量大时勿用
print(k,v)
四、集合
集合是一个无序的,不重复的数据组合,它的主要作用:
- 去重,把一个列表变成集合,就自动去重了。(字典也是天生去重的,key值唯一)
- 关系测试,测试两组数据之前的交集、差集、并集等关系
常用操作
>>> a = set([1,3,5,7,8,9,10]) #创建一个新的集合
>>> b = set("Lyon") #创建一个字符集合
>>>
>>> c = a | b #求a与b的并集
>>> c {1, 3, 5, 7, 8, 9, 10, 'o', 'L', 'n', 'y'} >>> d = c &a # 求c与b的交集
>>>
>>> d {1, 3, 5, 7, 8, 9, 10} >>> e = c-a #求差集,在c中但是不在a中
>>> e {'o', 'L', 'n', 'y'} >>> A = a^b #对称差集,项在ab中不会同时出现
>>> A {1, 3, 5, 7, 8, 'o', 9, 10, 'L', 'n', 'y'} >>> a.add("Lyon") #添加一项
>>> a {1, 3, 5, 7, 8, 9, 10, 'Lyon'} >>> b.update([5,6,7,8,9]) #添加多项
>>> b {5, 6, 7, 8, 'o', 9, 'L', 'n', 'y'} >>> b.remove("o") #删除一项
>>> b {5, 6, 7, 8, 9, 'L', 'n', 'y'} >>> len(b) #set的长度
8
>>> 5 in b #5是否在set b中
True >>> 1 not in b # 1 是否不在set b中
True >>> A.issubset(b) #测试是否A 中每一个元素都在b中
False A<= b >>> A.issuperset(b) #测试是否b中每一个元素都在A中
False A>=b >>> A.union(b) #返回一个新的set包含A和b中每一个元素
s | t {1, 3, 5, 6, 7, 8, 'o', 9, 10, 'L', 'n', 'y'} >>> A.intersection(b) A & b {5, 7, 8, 9, 'L', 'n', 'y'} >>> b.difference(A) b - A {6} >>> A.symmetric_difference(b) A^t {1, 3, 6, 'o', 10} >>> A.copy() #浅copy
{1, 3, 5, 7, 8, 'o', 9, 10, 'L', 'n', 'y'} >>>
五、文件操作
- 打开文件,得到文件句柄并赋值给一个变量
- 通过句柄对文件进行操作
- 关闭文件
基本操作
f = open('lyrics',encoding="utf-8") #默认为r模式,所以不谢'r'也能读
first_line = f.readline() print("The first line:",first_line) print("____我是分割线___".center(50,'-')) data = f.read() print(data) f.close()
打开文件的模式有:
- r,只读模式(默认)。
- w,只写模式。(不可读,只能写,如果不存在打开文件就会新创建一个,如果存在那就会删除里面所有内容)
- a,追加模式。(可读,写只能追加到最后面,如果不存在打开文件就会创建一个,存在则追加)
"+"表示可以同时读写某个文件
- r+,可读写文件。(可读,可写,可追加)
- w+,写读
- a+,同a
"U"表示在读取时,可以将\r \n \r\n自动转换成\n(与r或r+模式同使用)
- rU
- r+U
"b"表示处理二进制文件(如:FTP发送上传ISO镜像文件,linux可忽略,windows处理二进制文件时需标注)
- rb
- wb
- ab
其它语法
def close(self): # real signature unknown; restored from __doc__
""" Close the file. A closed file cannot be used for further I/O operations. close() may be called more than once without error. """
pass
def fileno(self, *args, **kwargs): # real signature unknown
""" Return the underlying file descriptor (an integer). """
pass
def isatty(self, *args, **kwargs): # real signature unknown
""" True if the file is connected to a TTY device. """
pass
def read(self, size=-1): # known case of _io.FileIO.read
""" 注意,不一定能全读回来 Read at most size bytes, returned as bytes. Only makes one system call, so less data may be returned than requested. In non-blocking mode, returns None if no data is available. Return an empty bytes object at EOF. """
return ""
def readable(self, *args, **kwargs): # real signature unknown
""" True if file was opened in a read mode. """
pass
def readall(self, *args, **kwargs): # real signature unknown
""" Read all data from the file, returned as bytes. In non-blocking mode, returns as much as is immediately available, or None if no data is available. Return an empty bytes object at EOF. """
pass
def readinto(self): # real signature unknown; restored from __doc__
""" Same as RawIOBase.readinto(). """
pass #不要用,没人知道它是干嘛用的
def seek(self, *args, **kwargs): # real signature unknown
""" Move to new file position and return the file position. Argument offset is a byte count. Optional argument whence defaults to SEEK_SET or 0 (offset from start of file, offset should be >= 0); other values are SEEK_CUR or 1 (move relative to current position, positive or negative), and SEEK_END or 2 (move relative to end of file, usually negative, although many platforms allow seeking beyond the end of a file). Note that not all file objects are seekable. """
pass
def seekable(self, *args, **kwargs): # real signature unknown
""" True if file supports random-access. """
pass
def tell(self, *args, **kwargs): # real signature unknown
""" Current file position. Can raise OSError for non seekable files. """
pass
def truncate(self, *args, **kwargs): # real signature unknown
""" Truncate the file to at most size bytes and return the truncated size. Size defaults to the current file position, as returned by tell(). The current file position is changed to the value of size. """
pass
def writable(self, *args, **kwargs): # real signature unknown
""" True if file was opened in a write mode. """
pass
def write(self, *args, **kwargs): # real signature unknown
""" Write bytes b to file, return number written. Only makes one system call, so not all of the data may be written. The number of bytes actually written is returned. In non-blocking mode, returns None if the write would block. """
pass
with语句
为了避免打开文件后忘记关闭,可以通过管理上下文,即:
1 with open('log','r') as f: 2
3 ... ...
如此方式,当with代码块执行完毕时,内部会自动关闭释放文件资源。
在python2.7后,with又支持同时对多个文件上下文进行管理,即:
1 with open('log1') as obj1, open('log2') as obj2: 2 pass
六、字符编码与转码
详细文章:
http://www.cnblogs.com/yuanchenqi/articles/5956943.html
http://www.diveintopython3.net/strings.html
需知:
- 在python2默认编码是ASCII, python3里默认是unicode
- unicode 分为 utf-32(占4个字节),utf-16(占两个字节),utf-8(占1-4个字节), so utf-16就是现在最常用的unicode版本, 不过在文件里存的还是utf-8,因为utf8省空间
- 在py3中encode,在转码的同时还会把string 变成bytes类型,decode在解码的同时还会把bytes变回string
各种字符编码里进行转换,基本都是先通过编码decode转换成Unicode编码,然后通过解码encode转换成自己所需要的编码。例如,ASCII码不支持中文,所以想要能够支持中文就必须得转码,无论是转成GBK,或者UTF-8,都需要先有ASCII转成Unicode,再由Unicode转成GBK或者UTF-8.
注:本文仅为学习笔记、摘要。
详细来源:http://www.cnblogs.com/alex3714/articles/5717620.html
15:38:54