Python序列——字符串

本文介绍Python序列中的字符串。

1. 字符串

字符串支持序列操作。

1.1 string模块预定义字符串

>>> import string
>>> string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.digits
'0123456789'
>>>

1.2 普通字符串与Unicode字符串

>>> u'Hello' + ' furzoom'
u'Hello furzoom'
>>>

1.3 只适用于字符串的操作

字符串格式化符号

格式化字符	转换方式
%c	转换成字符（ASCII码值，或者长度为一的字符串）
%r	优先用repr()函数进行字符串转换
%s	优先用str()函数进行字符串转换
%d/%i	转换成有符号十进制数
%u	转换成无符号十进制数
%o	转换成无符号八进制数
%x/%X	转换成无符号十六进制数
%e/%E	转换成科学计数法
%f/%F	转换成浮点数
%g/%G	%e和％f/％E和%F的简写
%%	转出%

格式化操作辅助指令

符号	作用
*	定义宽度或者小数点精度
-	左对齐
+	在正数前显示加号(+)
<sp>	在正数前显示空格
0	显示数字前填充0，而不是空格
#	在八进制数前显示0，在十六进制前显示0x或者0X
(var)	映射变量(字典参数)
m.n	m表示显示的最小总宽度，n是小数点后的位数

>>> '%x' % 108
'6c'
>>> '%X' % 108
'6C'
>>> '%#X' % 108
'0X6C'
>>> '%#x' % 108
'0x6c'
>>> '%f' % 1234.567890
'1234.567890'
>>> '%.2f' % 1234.567890
'1234.57'
>>> '%E' % 1234.567890
'1.234568E+03'
>>> '%e' % 1234.567890
'1.234568e+03'
>>> '%g' % 1234.567890
'1234.57'
>>> '%G' % 1234.567890
'1234.57'
>>> '%e' % 111111111111111111111
'1.111111e+20'
>>> 'Welcome to %(website)s, %(name)s' % {'name': 'mn', 'website': 'furzoom.com'}
'Welcome to furzoom.com, mn'
>>> from string import Template
>>> s = Template('There are ${howmany} ${lang} Quotation Symbols')
>>> print s.substitute(lang='Python', howmany=3)
There are 3 Python Quotation Symbols
>>> 
>>> print s.substitute(lang='Python')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/string.py", line 172, in substitute
return self.pattern.sub(convert, self.template)
  File "/usr/lib/python2.7/string.py", line 162, in convert
    val = mapping[named]
KeyError: 'howmany'
>>> 
>>> print s.safe_substitute(lang='Python')
There are ${howmany} Python Quotation Symbols
>>>

1.4 原始字符串

>>> '\n'
'\n'
>>> print '\n'


>>> r'\n'
'\\n'
>>> print r'\n'
\n
>>>

1.5 Unicode字符串操作符

他用Unicode原始字符串时，u要出现在r前面。

>>> ur'hello\nfurzoom'
u'hello\\nfurzoom'
>>> ru'hello\nmn'
  File "<stdin>", line 1
    ru'hello\nmn'
                ^
SyntaxError: invalid syntax
>>>

2. 内建函数

2.1 标准类型函数与序列操作函数

cmp()
len()
max()
min()
enumerate()
zip()

>>> s1 = 'furzoom'
>>> s2 = 'abcdefg'
>>> cmp(s1, s2)
1
>>> cmp(s2, s1)
-1
>>> cmp(s1, 'furzoom')
0
>>> len(s1)
7
>>> max(s1)
'z'
>>> min(s1)
'f'
>>> us1 = u'furzoom'
>>> len(us1)
7
>>> us1
u'furzoom'
>>> print us1
furzoom
>>> min(us1)
u'f'
>>> max(us1)
u'z'
>>> for i, t in enumerate(s1):
...     print i, t
... 
0 f
1 u
2 r
3 z
4 o
5 o
6 m
>>> zip(s2, s1)
[('a', 'f'), ('b', 'u'), ('c', 'r'), ('d', 'z'), ('e', 'o'), ('f', 'o'), ('g', 'm')]
>>>

2.2 字符串类型函数

raw_input()
str()
unicode()
chr()
unichr()
ord()

unichr()如果配置为USC2的Unicode，参数范围是range(65535)，如果配置为USC4的Unicode，那么参数范围是range(0x1100000)。

>>> name = raw_input("Enter your name: ")
Enter your name: furzoom MN
>>> name
'furzoom MN'
>>> len(name)
10
>>> unicode(name)
u'furzoom MN'
>>> str(unicode(name))
'furzoom MN'
>>>
>>> isinstance(u'\0xAB', str)
False
>>> isinstance('mn', unicode)
False
>>> isinstance(u'', unicode)
True
>>> isinstance('mn', str)
True
>>> chr(65)
'A'
>>> ord('a')
97
>>> unichr(12345)
u'\u3039'
>>> chr(12345)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: chr() arg not in range(256)
>>> ord(u'\uffff')
65535
>>>

3. 字符串内建函数

string.capitalize()
string.center(width[, fillchar])
string.count(sub[, start[, end]]])
string.decode（[encode[, errors]])
string.encode([encode[, errors]])
string.endswith(suffix[, start[, end]]])
string.expandtabs([tabsize])
string.find(sub[, start[, end]])
string.format(*args, **kwargs)
string.index(sub[, start[, end]])
string.isalnum()
string.isalpha()
string.isdigit()
string.islower()
string.isspace()
string.istitle()
string.isupper()
string.istitle()
string.join(iterable)
string.ljust(width[, fillchar])
string.lower()
string.lstrip([chars])
string.partition(sep)
string.replace(old, new[, count])
string.rfind(sub[, start[, end]])
string.rindex(sub[, start[, end]])
string.rjust(width[, fillchar])
string.rpartition(sep)
string.rsplit([sep[, maxsplit]])
string.rstrip([chars])
string.split([sep[, maxsplit]])
string.splitlines([keepends])
string.startswith(prefix[, start[, end]])
string.strip([chars])
string.swapcase()
string.title()
string.translate(talbe[, deletechars])
string.upper()
string.zfill(width)

string.format()将在后面进行介绍。

>>> s = 'welcome to visit furzoom.com'
>>> s.capitalize()
'Welcome to visit furzoom.com'
>>> s.center(50)
' welcome to visit furzoom.com '
>>> s.center(50, '#')
'###########welcome to visit furzoom.com###########'
>>> s.count('om')
3
>>> s.count('om', -10)
2
>>> s.count('om', 0, 10)
1
>>> s.decode()
u'welcome to visit furzoom.com'
>>> s.decode().encode()
'welcome to visit furzoom.com'
>>> s.endswith('com')
True
>>> s.endswith('')
True
>>> s.endswith('mn')
False
>>> s.endswith('co', 0, -1)
True
>>> s1 = '1\t23\t456\t789'
>>> s1.expandtabs()
'1 23 456 789'
>>> s1.expandtabs(4)
'1 23 456 789'
>>> s1.expandtabs(3)
'1 23 456 789'
>>> s1.expandtabs(5)
'1 23 456 789'
>>> s1.expandtabs(6)
'1 23 456 789'
>>> s.find('om')
4
>>> s.find('mn')
-1
>>> s.find('om', 5)
22
>>> s.index('om')
4
>>> s.index('mn')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found
>>> '234'.isalnum()
True
>>> s.isalnum()
False
>>> ''.isalnum()
False
>>> s.isalpha()
False
>>> 'furzoom'.isalpha()
True
>>> s.isdigit()
False
>>> '234'.isdigit()
True
>>> ''.isdigit()
False
>>> s.islower()
True
>>> '234'.islower()
False
>>> s.isspace()
False
>>> ' \t'.isspace()
True
>>> s.istitle()
False
>>> 'Welcome To Furzoom'.istitle()
True
>>> s.isupper()
False
>>> 'MN'.isupper()
True
>>> '#'.join([str(i) for i in range(10)])
'0#1#2#3#4#5#6#7#8#9'
>>> s.ljust(40)
'welcome to visit furzoom.com '
>>> s.ljust(40, '#')
'welcome to visit furzoom.com############'
>>> s.lower()
'welcome to visit furzoom.com'
>>> ss = s.center(40)
>>> ss
' welcome to visit furzoom.com '
>>> ss.lstrip()
'welcome to visit furzoom.com '
>>> ss.lstrip(' we')
'lcome to visit furzoom.com '
>>> s.partition('om')
('welc', 'om', 'e to visit furzoom.com')
>>> s.partition('mn')
('welcome to visit furzoom.com', '', '')
>>> s.replace('o', '#')
'welc#me t# visit furz##m.c#m'
>>> s.replace('o', '#', 3)
'welc#me t# visit furz#om.com'
>>> s.rfind('o')
26
>>> s.rfind('o', 25)
26
>>> s.rfind('o', -3)
26
>>> s.rfind('o', -3, -20)
-1
>>> s.rfind('o', 5, 15)
9
>>> s.rindex('om')
26
>>> s.rjust(40)
' welcome to visit furzoom.com'
>>> s.rjust(40, '#')
'############welcome to visit furzoom.com'
>>> s.rpartition('oom')
('welcome to visit furz', 'oom', '.com')
>>> s.rsplit()
['welcome', 'to', 'visit', 'furzoom.com']
>>> s.rsplit(' ', 2)
['welcome to', 'visit', 'furzoom.com']
>>> ss.rstrip()
' welcome to visit furzoom.com'
>>> ss.rstrip(' m')
' welcome to visit furzoom.co'
>>> 'ab\n\nde fg\rhi\r\n'.splitlines()
['ab', '', 'de fg', 'hi']
>>> 'ab\n\nde fg\rhi\r\n'.splitlines(True)
['ab\n', '\n', 'de fg\r', 'hi\r\n']
>>> ''.splitlines()
[]
>>> ''.split('\n')
['']
>>> 'line\n'.split('\n')
['line', '']
>>> 'line\n'.splitlines()
['line']
>>> s.startswith('wel')
True
>>> s.startswith(' ')
False
>>> ss.strip()
'welcome to visit furzoom.com'
>>> ss.strip(' wm')
'elcome to visit furzoom.co'
>>> s.swapcase()
'WELCOME TO VISIT FURZOOM.COM'
>>> s.title()
'Welcome To Visit Furzoom.Com'
>>> s.title().swapcase()
'wELCOME tO vISIT fURZOOM.cOM'
>>> s.translate(None, 'aeiou')
'wlcm t vst frzm.cm'
>>> import string
>>> s.translate(string.maketrans('aeiou', '12345'))
'w2lc4m2 t4 v3s3t f5rz44m.c4m'
>>> s.upper()
'WELCOME TO VISIT FURZOOM.COM'
>>> s.zfill(40)
'000000000000welcome to visit furzoom.com'

4. 字符串特有性质

4.1 转义字符

转义字符	十六进制
\0	0x00
\a	0x07
\b	0x08
\t	0x09
\n	0x0A
\v	0x0B
\f	0x0C
\r	0x0D
\e	0x1B
\”	0x22
\’	0x27
\\	0x5C

>>> print 'aaa\b\bbb'
abb
>>> print 'aaaaaaa\rbbc'
bbcaaaa

4.2 三引号

使用三引号，字符串可以包含换行符、制表符等其他特殊字符。常常在需要包含HTML和SQL语句时使用。

4.3 字符串是不可变数据类型

当修改一个字符串时，都是新建了一个字符串。

秒客网