I have a log file that is full of tweets. Each tweet is on its own line so that I can iterate though the file easily.
我有一个充满推文的日志文件。每条推文都在自己的行上,这样我就可以轻松地遍历文件了。
An example tweet would be like this:
一个示例推文就像这样:
@ sample This is a sample string $ 1.00 # sample
I want to be able to clean this up a bit by removing the white space between the special character and the following alpha-numeric character. "@ s", "$ 1", "# s"
我希望通过删除特殊字符和下面的字母数字字符之间的空格来清除它。 “@ s”,“$ 1”,“#s”
So that it would look like this:
所以它看起来像这样:
@sample This is a sample string $1.00 #sample
I'm trying to use regular expressions to match these instances because they can be variable, but I am unsure of how to go about doing this.
我正在尝试使用正则表达式来匹配这些实例,因为它们可以变量,但我不确定如何执行此操作。
I've been using re.sub() and re.search() to find the instances, but am struggling to figure out how to only remove the white space while leaving the string intact.
我一直在使用re.sub()和re.search()来查找实例,但我正在努力弄清楚如何在保持字符串完整的同时删除空格。
Here is the code I have so far:
这是我到目前为止的代码:
#!/usr/bin/python
import csv
import re
import sys
import pdb
import urllib
f=open('output.csv', 'w')
with open('retweet.csv', 'rb') as inputfile:
read=csv.reader(inputfile, delimiter=',')
for row in read:
a = row[0]
matchObj = re.search("\W\s\w", a)
print matchObj.group()
f.close()
Thanks for any help!
谢谢你的帮助!
3 个解决方案
#1
5
Something like this using re.sub
:
使用re.sub这样的东西:
>>> import re
>>> strs = "@ sample This is a sample string $ 1.00 # sample"
>>> re.sub(r'([@#$])(\s+)([a-z0-9])', r'\1\3', strs, flags=re.I)
'@sample This is a sample string $1.00 #sample'
#2
1
>>> re.sub("([@$#]) ", r"\1", "@ sample This is a sample string $ 1.00 # sample")
'@sample This is a sample string $1.00 #sample'
#3
0
This seemed to work pretty nice.
这似乎工作得很好。
print re.sub(r'([@$])\s+',r'\1','@ blah $ 1')
#1
5
Something like this using re.sub
:
使用re.sub这样的东西:
>>> import re
>>> strs = "@ sample This is a sample string $ 1.00 # sample"
>>> re.sub(r'([@#$])(\s+)([a-z0-9])', r'\1\3', strs, flags=re.I)
'@sample This is a sample string $1.00 #sample'
#2
1
>>> re.sub("([@$#]) ", r"\1", "@ sample This is a sample string $ 1.00 # sample")
'@sample This is a sample string $1.00 #sample'
#3
0
This seemed to work pretty nice.
这似乎工作得很好。
print re.sub(r'([@$])\s+',r'\1','@ blah $ 1')