shlex模块为基于Uninx shell语法的语言提供了一个简单的lexer(也就是tokenizer)
举例说明:
有一个文本文件quotes.txt
This string has embedded "double quotes" and 'single quotes' in it,
and even "a 'nested example'".
python 代码
test.py
#!/usr/bin/env python
import shlex
import sys
if len(sys.argv) != 2:
print 'please input'
sys.exit(1)
filename = sys.argv[1]
body = file(filename,'rt').read()
print 'ORIGINAL:',repr(body)
print 'TOKENS:'
lexer = shlex.shlex(body)
for token in lexer:
print repr(token)
执行命令:
./test.py quotes.txt
ORIGINAL: 'This string has embedded "double quotes" and \'single quotes\' in it,\nand even "a \'nested example\'".\n'
TOKENS:
'This'
'string'
'has'
'embedded'
'"double quotes"'
'and'
"'single quotes'"
'in'
'it'
','
'and'
'even'
'"a \'nested example\'"'
'.'
可以看出shlex非常智能强大,比正则表达式方便多了。