一、概述

字符串 类似于C中的字符数组（功能上更像C++中的string），它是由一个个字符组成的序列。与C/C++不同的是，Python中没有字符这个类型，而是用 长度为1的字符串 来表示字符的概念。

二、分类

Python中的字符串共有2种：ASCII字符串（str）和 Unicode字符串（unicode）。每一种字符串又可以进一步划分：根据对转义字符处理方式的不同，分为 常规字符串 和 原始字符串；根据字符串是否跨行，分为 单行字符串 和 跨行字符串。每一个字符串的字面值既可以包含在 单引号（'）中，又可以包含在 双引号（"）中。

以单引号为例（双引号的情况把 ' 换成 " 即可），字符串的 字面值表示 与上述分类之间的关系如下：

字面值表示	ASCII字符串	Unicode字符串	常规字符串	原始字符串	单行字符串	跨行字符串
'Hello, Python'	√		√		√
'''Hello Python'''	√		√			√
r'Hello, Python'	√			√	√
r'''Hello Python'''	√			√		√
u'Hello, Python'		√	√		√
u'''Hello Python'''		√	√			√
ur'Hello, Python'		√		√	√
ur'''Hello Python'''		√		√		√

以上各种字符串（字面值表示）的应用场景：

单引号 和 双引号 基本等价，除了字符串中含有引号字符的情况（如"it's ok"、'it "seems" ok'等）
单行字符串 主要用于表示简短的字符文本，日常使用较多
跨行字符串 允许字符文本跨越多行，从而更直观、易读，主要用于表示 代码注释、数据库SQL语句 或 HTML文本 等
常规字符串 能够理解转义字符，会对转义字符进行特殊处理，常用于表示 文件内容 或 打印文本 中
原始字符串 中没有转义字符，所有字符都按照字面意思使用，多用于表示 路径名 或 正则表达式 等
ASCII字符串 是英文字符串的同义词，在英文语言环境中可以满足所有字符串处理的需要
Unicode字符串 可以支持全球的所有语言，一个国际化的软件应该使用Unicode字符串

三、操作

除了 Python基础：序列中给出的 通用序列操作 以外，字符串还支持以下 字符串操作：

操作	说明
str.capitalize()	将str的第一个字符大写，其余字符小写（非字母字符不受影响）
str.center(width[, fillchar])	居中str（返回长度为width的字符串：str居中、以fillchar填充两侧）
str.count(sub[, start[, end]])	返回sub子串在str[start:end]中出现的次数（非重叠）
str.decode([encoding[, errors]])	以编码格式encoding解码str
str.encode([encoding[, errors]])	以编码格式encoding编码str
str.endswith(suffix[, start[, end]])	如果str[start:end]以suffix子串结尾，返回True；否则，返回False
str.expandtabs([tabsize])	将str中的所有'\t'用空格替换并补齐，使得每一个'\t'后的子串位于第tabsize*n列（n=0,1,2,...）
str.find(sub[, start[, end]])	返回sub子串在str[start:end]中的最小下标，不存在则返回-1
str.format(args, *kwargs)	字符串格式化（推荐使用最新的format，而不是传统的 % ）
str.index(sub[, start[, end]])	返回sub子串在str[start:end]中的最小下标，不存在则抛出ValueError异常
str.isalnum()	如果str不为空且其中仅含有字母或数字，返回True；否则，返回False
str.isalpha()	如果str不为空且其中仅含有字母，返回True；否则，返回False
str.isdigit()	如果str不为空且其中仅含有数字，返回True；否则，返回False
str.islower()	如果str至少包含一个字母，且这些字母全部为小写，返回True；否则，返回False
str.isspace()	如果str不为空且其中仅含有空白符，返回True；否则，返回False
str.istitle()	如果str不为空且是标题化的（见title()），返回True；否则，返回False
str.isupper()	如果str至少包含一个字母，且这些字母全部为大写，返回True；否则，返回False
str.join(iterable)	串联迭代对象iterable生成的字符串序列，并以str分隔
str.ljust(width[, fillchar])	类似center()，但左对齐str，以fillchar填充右侧
str.lower()	将str中的字母全部小写
str.lstrip([chars])	删除str开始处、位于chars中的字符（默认为空白符）
str.partition(sep)	如果str中存在sep，以第一个sep的下标为分界，返回元组 (sep之前的子串, sep, sep之后的子串)；否则，返回元组 (str, '', '')
str.replace(old, new[, count])	将str中的old子串替换为new子串，只执行count次替换（省略count则全部替换）
str.rfind(sub[, start[, end]])	逆序版find()：返回最大下标
str.rindex(sub[, start[, end]])	逆序版index()：返回最大下标
str.rjust(width[, fillchar])	类似center()，但右对齐str，以fillchar填充左侧
str.rpartition(sep)	逆序版partition()：存在则以最后一个sep的下标为分界，返回元组（sep之前的子串, sep, sep之后的子串)；否则返回 ('', '', str)
str.rsplit([sep[, maxsplit]])	从右向左分割str，以sep为分割符（省略时，sep默认为空白符），只执行maxsplit次分割（省略则全部分割）
str.rstrip([chars])	删除str结尾处、位于chars中的字符（默认为空白符）
str.split([sep[, maxsplit]])	逆序版rsplit()：从左向右分割str
str.splitlines([keepends])	以换行符（universal newlines）为分割符，分割str；如果指定keepends且为True，则分割后保留换行符
str.startswith(prefix[, start[, end]])	逆序版endswith()：判断str[start:end]是否以prefix子串开始
str.strip([chars])	综合lstrip()和rstrip()的功能：同时删除str开始和结尾处的空白符（或指定字符）
str.swapcase()	将str中的所有字母的大小写翻转：大写变小写，小写变大写
str.title()	标题化str：使得str中非字母字符之后的字母（或首字母）大写，字母字符之后的字母小写
str.translate(table[, deletechars])	从str中删除由deletechars指定的字符（如果table不为None，再根据table中的转换关系对剩余字符进行转换）
str.upper()	将str中的所有小写字母变为大写
str.zfill(width)	返回长度为width的字符串：在str左侧补'0'

以上操作的示例如下：

# capitalize()

>>> 'hello'.capitalize(), '-*-HELLO'.capitalize()

('Hello', '-*-hello')

# ljust()、center()和rjust()

>>> s = 'hello'

>>> s.ljust(2), s.ljust(10), s.ljust(10, '=')

('hello', 'hello     ', 'hello=====')

>>> s.center(2), s.center(10), s.center(10, '=')

('hello', '  hello   ', '==hello===')

>>> s.rjust(2), s.rjust(10), s.rjust(10, '=')

('hello', '     hello', '=====hello')

# count()

>>> 'abababababa'.count('a'), 'abababababa'.count('aba')

(6, 3)

# encode()和decode()

>>> s = u'你好'

>>> e = s.encode('utf-8')

>>> d = e.decode('utf-8')

>>> s, e, d

(u'\u4f60\u597d', '\xe4\xbd\xa0\xe5\xa5\xbd', u'\u4f60\u597d')

# startswith()和endswith()

>>> s = 'hello, python'

>>> s.startswith('lo'), s.startswith('lo', 3)

(False, True)

>>> s.startswith('he'), s.startswith('he', 3)

(True, False)

>>> s.endswith('py'), s.endswith('py', 0, -4)

(False, True)

>>> s.endswith('python'), s.endswith('python', 0, -4)

(True, False)

# expandtabs()

>>> '01\t012\t0123\t01234'.expandtabs() # tabsize默认为8

'01      012     0123    01234'

>>> '01\t012\t0123\t01234'.expandtabs(4)

'01  012 0123    01234'

# find()、rfind()、index()和rindex()

>>> s = 'goooogle'

>>> s.find('o'), s.rfind('o')

(1, 4)

>>> s.index('o'), s.rindex('o')

(1, 4)

>>> s.find('w')

-1

>>> s.index('w')

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

ValueError: substring not found

# format()和%

>>> "I'm %s, there're %d 'l's in my name." % ('RussellLuo', 3)

"I'm RussellLuo, there're 3 'l's in my name."

>>> "I'm {0}, there're {1} 'l's in my name.".format('RussellLuo', 3)

"I'm RussellLuo, there're 3 'l's in my name."

>>> "I'm {name}, there're {count} 'l's in my name.".format(name='RussellLuo', count=3)

"I'm RussellLuo, there're 3 'l's in my name."

# isalnum()、isalpha()、isdigit()和isspace()

>>> 'abc'.isalnum(), 'abc'.isalpha()

(True, True)

>>> '123'.isalnum(), '123'.isdigit()

(True, True)

>>> 'abc123'.isalnum(), 'abc123'.isalpha(), 'abc123'.isdigit()

(True, False, False)

>>> ''.isspace(), '  \t\r\n'.isspace()

(False, True)

# isupper()、islower()、upper()、lower()和swapcase()

>>> s = 'RussellLuo'

>>> s.isupper(), s.upper()

(False, 'RUSSELLLUO')

>>> s.islower(), s.lower()

(False, 'russellluo')

>>> s.swapcase()

'rUSSELLlUO'

# istitle()和title()

>>> s.istitle(), s.title()

(False, "I'M Russellluo, There'Re 3 'L'S In My Name.")

# join()

>>> l = ['how', 'are', 'you']

>>> ''.join(l)

'howareyou'

>>> ' '.join(l)

'how are you'

>>> '...'.join(l)

'how...are...you'

# lstrip()、rstrip()和strip()

>>> s = u'人人为我，我为人人'

>>> print s.lstrip(u'人')

为我，我为人人

>>> print s.rstrip(u'人')

人人为我，我为

>>> print s.strip(u'人')

为我，我为

# partition()和rpartition()

>>> s = 'where are you, here you are'

>>> s.partition('you')

('where are ', 'you', ', here you are')

>>> s.rpartition('you')

('where are you, here ', 'you', ' are')

>>> s.partition('him')

('where are you, here you are', '', '')

>>> s.rpartition('him')

('', '', 'where are you, here you are')

# replace()

>>> s = 'goooooooooogle'

>>> s.replace('o', 'u'), s.replace('o', 'u', 5)

('guuuuuuuuuugle', 'guuuuuooooogle')

# split()和rsplit()

>>> s = 'how are you'

>>> s.split()

['how', 'are', 'you']

>>> s.rsplit()

['how', 'are', 'you']

>>> s.split('o')

['h', 'w are y', 'u']

>>> s.rsplit('o')

['h', 'w are y', 'u']

>>> s.split('o', 1)

['h', 'w are you']

>>> s.rsplit('o', 1)

['how are y', 'u']

# splitlines()

>>> s = 'ab c\n\nde fg\rkl\r\n'

>>> s.splitlines()

['ab c', '', 'de fg', 'kl']

>>> s.splitlines(True)

['ab c\n', '\n', 'de fg\r', 'kl\r\n']

# translate()

>>> s = 'look at this'

>>> s.translate(None, 'lost')

'k a hi'

>>> import string

>>> t = string.maketrans(' ahik', 'egndl')

>>> s.translate(t, 'lost')

'legend'

# zfill()

>>> '12'.zfill(10), '-12'.zfill(10)

('0000000012', '-000000012')

四、格式化

1、%

对于字符串的格式化，我们通常使用 % 操作符。在 % 操作符中，参数的类型和格式 由转换限定符（conversion specifier）来指定，而 参数的值 可以支持两种输入形式：元组和字典。

# 1. 元组输入

>>> print '0x%x' % 108

0x6c

>>> print 'hello, %s' % ('world')

hello, world

>>> print '%s has %03d quote types' % ('Python', 2)

Python has 002 quote types

# 2. 字典输入

>>> print '%(language)s has %(number)03d quote types.' % \

...       {'language': 'Python', 'number': 2}

Python has 002 quote types.

2、str.format()

从2.6版本开始，Python提供了全新的格式化方法 str.format() ，并且推荐使用str.format()代替传统的 % 进行格式化（实际上，str.format()在Python3中是字符串格式化的标准方法）。

相比于传统的 % 操作符，str.format()有以下几个特点：

不需要指定 参数的类型
通过 位置参数（positional argument）和 关键字参数（keyword argument）来指定 参数的值
在格式化字符串中，使用以 {} 包围的 替换域（replacement fields）为预期的参数占位
在替换域中，支持访问参数的属性或成员，并且可以对参数调用str()或repr()

一些使用str.format()的示例如下：

# 位置参数

>>> '{} has {} quote types.'.format('Python', 2)

'Python has 2 quote types.'

>>> '{0} has {1} quote types.'.format('Python', 2)

'Python has 2 quote types.'

# 关键字参数

>>> '{language} has {number} quote types.'.format(language='Python', number=2)

'Python has 2 quote types.'

# 访问参数的属性

>>> class N: pass

...

>>> N.value = 2

>>> '{0} has {1.value} quote types.'.format('Python', N)

'Python has 2 quote types.'

# 访问参数的成员

>>> '{data[0]} has {data[1]} quote types.'.format(data=['Python', 2])

'Python has 2 quote types.'

# !s对参数调用str()，!r对参数调用repr()

>>> '{0!s} has {1!r} quote types.'.format('Python', str(2))

"Python has '2' quote types."

五、Unicode

现代的软件和应用都应该支持国际化（Internationalization，简写为I18N），因此都应该使用Unicode。

软件的国际化是一个系统性工程，以下是使用Unicode的金科玉律：

为所有字符串都加上前缀 u
使用unicode()函数（而不是str()）
只在输出（写数据到文件或数据库或网络）时使用encode()进行编码，只在输入（从文件或数据库或网络读数据）时使用decode()进行解码
确保第三方依赖（如模块、数据库、Web框架等）支持Unicode

正确使用Unicode的简单示例如下：

#!/usr/bin/env python

# -*- coding: utf-8 -*-

''' 使用Unicode处理文件的输入输出（编码格式为utf-8） '''

CODEC = u'utf-8'

FILE = ur'/home/russellluo/python/unicode.txt'

text = u'English中文'

# 输出

text_cod = text.encode(CODEC)

f = open(FILE, 'w')

f.write(text_cod)

f.close()

# 输入

f = open(FILE, 'r')

text_cod = f.read()

f.close()

text_dec = text_cod.decode(CODEC)

print text_dec

运行结果：

$ python unicode_example.py

English中文

六、相关模块

Python标准库中与字符串相关的核心模块有（更多细节参考 String Services）：

模块	说明
string	字符串操作的相关函数和工具
re	正则表达式：强大的字符串模式匹配模块
struct	字符串与二进制之间的转换
difflib	比较序列之间的差异
StringIO / cStringIO	字符串缓冲对象，操作方法类似于file对象
textwrap	文本包装和填充
codecs	解码器注册和基类
unicodedata	Unicode数据库
stringprep	提供用于互联网协议的Unicode字符串

Python基础：序列（字符串）的更多相关文章

Python基础数据类型-字符串（string）
Python基础数据类型-字符串(string) 作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客使用的是Python3.6版本,以及以后分享的每一篇都是Python3.x版 ...
Python基础(二) —— 字符串、列表、字典等常用操作
一.作用域对于变量的作用域,执行声明并在内存中存在,该变量就可以在下面的代码中使用. 二.三元运算 result = 值1 if 条件 else 值2 如果条件为真:result = 值1如果条件为 ...
Python基础&lowbar;&lowbar;字符串拼接、格式化输出与复制
上一节介绍了序列的一些基本操作类型,这一节针对字符串的拼接.格式化输出以及复制的等做做详细介绍.一. 字符串的拼接 a = 'I', b = 'love', c = 'Python'. 我们的目的是: ...
python基础、字符串和if条件语句，while循环,跳出循环、结束循环
一:Python基础 1.文件后缀名: .py 2.Python2中读中文要在文件头写: -*-coding:utf8-*- 3.input用法 n为变量,代指某一变化的值 n = inpu ...
python基础类型—字符串
字符串str 用引号引起开的就是字符串(单引号,双引号,多引号) 1.字符串的索引与切片. 索引即下标,就是字符串组成的元素从第一个开始,初始索引为0以此类推. a = 'ABCDEFGHIJK' p ...
Python基础二字符串和变量
了解一下Python中的字符串和变量,和Java,c还是有点区别的,别的不多说,上今天学习的代码 Python中没有自增自减这一项,在转义字符那一块,\n,\r\n都是表示回车,但是对于不同的操作系统 ...
Python基础之字符串
字符串内置处理函数 1.capitalize() 描述: 将字符串的第一个字母变成大写,其他字母变小写. 示例: a= "hello world" print (a.capital ...
python基础知识——字符串详解
大多数人学习的第一门编程语言是C/C++,个人觉得C/C++也许是小白入门的最合适的语言,但是必须承认C/C++确实有的地方难以理解,初学者如果没有正确理解,就可能会在使用指针等变量时候变得越来越困惑 ...
Day2 Python基础学习——字符串、列表、元组、字典、集合
Python中文学习大本营:http://www.pythondoc.com/ 一.字符串操作一.用途:名字,性格,地址 name = 'wzs' #name = str('wzs')print(i ...
2015/8/31 Python基础(5)&colon;字符串
字符串是Python最常见的一种类型.通过在引号间包含字符的方式创建它.Python里单双引号的作用是一致的.Python的对象类型里不存在字符型,一般用单个字符的字符串来使用.Python的字符串是 ...

随机推荐

Golang汇编命令解读
我们可以很容易将一个golang程序转变成汇编语言. 比如我写了一个main.go: package main func g(p int) int { return p+1; } func main( ...
csharp&colon;Nhibernate Procedure with CreateSQLQuery and GetNamedQuery
<?xml version="1.0" encoding="utf-8"?> <hibernate-mapping assembly=&quo ...
T-SQL查询进阶--深入浅出视图
视图可以看作定义在SQL Server上的虚拟表.视图正如其名字的含义一样,是另一种查看数据的入口.常规视图本身并不存储实际的数据,而仅仅存储一个Select语句和所涉及表的metadata. 视图简 ...
ajax正确的简单封装&OpenCurlyDoubleQuote;姿势”
window.meng = window.meng || {}; (function ($) { function getAjaxDate(url, apikey) { var datas; $.aj ...
从最近MySQL的优化工作想到的
最近决定将以前同事写的存储过程查看一遍,寻找一些代码上写的不太好的地方,争取进行修改以后让这些过程达到一个很好的运行速度.下面是遇到的最多的几个问题. 我遇到了这样的一个SQL: select nam ...
[PWA] Enable Push Notification in your web app
1. Clone the project: git clone https://github.com/GoogleChrome/push-notifications.git 2. install th ...
CSU 1120 病毒
最长公共上升子序列(LCIS) 裸的算法题. 动态规划: 两组数组a[n].b[m]. f[i][i]表示以a[i].b[j]结尾的两个数组的LCIS. 转移方程: a[i]!=b[j] : f[i] ...
java中printf中用法详解
目前printf支持以下格式: %c 单个字符 %d 十进制整数 %f 十进制浮点数 %o 八进制数 %s 字符串 %u 无符号十进制数 %x 十六进制数 %% 输出百分号% printf的格式控制的 ...
开发composer包，打通github和packagist，并自动更新
1. 首先需要本地安装好composer,并配置好环境变量,在命令行输入composer,显示以下信息就表示正常安装 2. 在github对应项目的根目录下进行初始化composer 初始化完成后,就 ...
BBS论坛（二十）
20.1.cms添加轮播图后台逻辑代码完成 (1)apps/models.py from exts import db from datetime import datetime class Bann ...