在python中使用正则表达式查找可嵌套字符串组

在网上看到一个小需求，需要用正则表达式来处理。原需求如下：

找出文本中包含”因为……所以”的句子，并以两个词为中心对齐输出前后3个字，中间全输出，如果“因为”和“所以”中间还存在“因为”“所以”，也要找出来，另算一行，输出格式为：

行号前面3个字 *因为* 全部 &所以& 后面3个字(标点符号算一个字)

2 还不是 *因为* 这里好， &所以& 没有人

实现方法如下：

									#encoding:utf-8

									import os

									import re

									def getPairStriList(filename):

									  pairStrList = []

									  textFile = open(filename, 'r')

									  pattern = re.compile(u'.{3}\u56e0\u4e3a.*\u6240\u4ee5.{3}') #u'\u56e0\u4e3a和u'\u6240\u4ee5'分别为“因为”和“所以”的utf8码

									  for line in textFile:

									    utfLine = line.decode('utf8')

									    result = pattern.search(utfLine)

									    while result:

									      resultStr = result.group()

									      pairStrList.append(resultStr)

									      result = pattern.search(resultStr,2,len(resultStr)-2)

									  #对每个字符串进行格式转换和拼接  

									  for i in range(len(pairStrList)):

									    pairStrList[i] = pairStrList[i][:3] + pairStrList[i][3:5].replace(u'\u56e0\u4e3a',u' *\u56e0\u4e3a* ',1) + pairStrList[i][5:]

									    pairStrList[i] = pairStrList[i][:len(pairStrList[i])-5] + pairStrList[i][len(pairStrList[i])-5:].replace(u'\u6240\u4ee5',u' &\u6240\u4ee5& ',1)

									    pairStrList[i] = str(i+1) + ' ' + pairStrList[i]

									  return pairStrList

									  if __name__ == '__main__':

									  pairStrList = getPairStriList('test.txt')

									  for str in pairStrList:

									    print str

PS：下面看下python里使用正则表达式的组嵌套

由于组本身是一个完整的正则表达式，所以可以将组嵌套在其他组中，以构建更复杂的表达式。下面的例子，就是进行组嵌套的例子：

				 
				?

									#python 3.6 

									#蔡军生  

									#http://blog.csdn.net/caimouse/article/details/51749579 

									# 

									import re 

									def test_patterns(text, patterns): 

									  """Given source text and a list of patterns, look for 

									  matches for each pattern within the text and print 

									  them to stdout. 

									  """

									  # Look for each pattern in the text and print the results 

									  for pattern, desc in patterns: 

									    print('{!r} ({})\n'.format(pattern, desc)) 

									    print(' {!r}'.format(text)) 

									    for match in re.finditer(pattern, text): 

									      s = match.start() 

									      e = match.end() 

									      prefix = ' ' * (s) 

									      print( 

									        ' {}{!r}{} '.format(prefix, 

									                   text[s:e], 

									                   ' ' * (len(text) - e)), 

									        end=' ', 

									      ) 

									      print(match.groups()) 

									      if match.groupdict(): 

									        print('{}{}'.format( 

									          ' ' * (len(text) - s), 

									          match.groupdict()), 

									        ) 

									    print() 

									  return

例子：

				 
				?

									#python 3.6 

									#蔡军生  

									#http://blog.csdn.net/caimouse/article/details/51749579 

									# 

									from re_test_patterns_groups import test_patterns 

									test_patterns( 

									  'abbaabbba', 

									  [(r'a((a*)(b*))', 'a followed by 0-n a and 0-n b')], 

									)

结果输出如下：

				 
				?

									'a((a*)(b*))' (a followed by 0-n a and 0-n b)

									 'abbaabbba'

									 'abb'    ('bb', '', 'bb')

									   'aabbb'  ('abbb', 'a', 'bbb')

									     'a' ('', '', '')

总结

以上所述是小编给大家介绍的在python中使用正则表达式查找可嵌套字符串组，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对服务器之家网站的支持！

秒客网

在python中使用正则表达式查找可嵌套字符串组

相关文章