如下所示:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
#coding=utf-8
import sys, re, os
def getDictList( dict ):
regx = '''[\w\~`\!\@\#\$\%\^\&\*\(\)\_\-\+\=\[\]\{\}\:\;\,\.\/\<\>\?]+'''
with open ( dict ) as f:
data = f.read()
return re.findall(regx, data)
def rmdp(dictList):
return list ( set (dictList))
def fileSave(dictRmdp, out):
with open (out, 'a' ) as f:
for line in dictRmdp:
f.write(line + '\n' )
def main():
try :
dict = sys.argv[ 1 ].strip()
out = sys.argv[ 2 ].strip()
except Exception, e:
print 'error:' , e
me = os.path.basename(__file__)
print 'usage: %s <input> <output>' % me
print 'example: %s dict.txt dict_rmdp.txt' % me
exit()
dictList = getDictList( dict )
dictRmdp = rmdp(dictList)
fileSave(dictRmdp, out)
if __name__ = = '__main__' :
main()
|
以上这篇python 高效去重复 支持GB级别大文件的示例代码就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/meinaozi/article/details/79326512