I have text report files I need to "split()" like strings are split up into arrays.
我有文本报告文件,我需要“split()”像字符串被分成数组。
So the file is like:
所以文件就像:
BOBO:12341234123412341234 1234123412341234123412341 123412341234 BOBO:12349087609812340-98 43690871234509875 45 BOBO:32498714235908713248 0987235
And I want to create 3 sub-files out of that splitting on lines that begin with "^BOBO:". I don't really want 3 physical files, I'd prefer 3 different file pointers.
我想在以“^ BOBO:”开头的行上创建3个子文件。我真的不想要3个物理文件,我更喜欢3个不同的文件指针。
2 个解决方案
#1
3
Perhaps use itertools.groupby:
也许使用itertools.groupby:
import itertools
def bobo(x):
if x.startswith('BOBO:'):
bobo.count+=1
return bobo.count
bobo.count=0
with open('a') as f:
for key,grp in itertools.groupby(f,bobo):
print(key,list(grp))
yields:
(1, ['BOBO:12341234123412341234\n', '1234123412341234123412341\n', '123412341234\n'])
(2, ['BOBO:12349087609812340-98\n', '43690871234509875\n', '45\n', '\n'])
(3, ['BOBO:32498714235908713248\n', '0987235\n'])
Since you say you don't want physical files, the whole file must be able to fit in memory. In that case, to create file-like objects, use the cStringIO module:
由于您说您不需要物理文件,因此整个文件必须能够适合内存。在这种情况下,要创建类似文件的对象,请使用cStringIO模块:
import cStringIO
with open('a') as f:
file_handles=[]
for key,grp in itertools.groupby(f,bobo):
file_handles.append(cStringIO.StringIO(''.join(grp)))
file_handles
will be a list of file-like objects, one for each "BOBO:" stanza.
file_handles将是一个类似文件的对象列表,每个对应一个“BOBO:”节。
#2
1
If you can deal with keeping them in memory to work with them something like this probably works:
如果你可以处理将它们留在内存中与它们一起工作这样的东西可能有效:
subFileBlocks = []
with open('myReportFile.txt') as fh:
for line in fh:
if line.startswith('BOBO'):
subFileBlocks.append(line)
else:
subFileBlocks[-1] += line
At the end of that subFileBlocks should contain your sections as strings.
在subFileBlocks的末尾应该包含您的部分作为字符串。
#1
3
Perhaps use itertools.groupby:
也许使用itertools.groupby:
import itertools
def bobo(x):
if x.startswith('BOBO:'):
bobo.count+=1
return bobo.count
bobo.count=0
with open('a') as f:
for key,grp in itertools.groupby(f,bobo):
print(key,list(grp))
yields:
(1, ['BOBO:12341234123412341234\n', '1234123412341234123412341\n', '123412341234\n'])
(2, ['BOBO:12349087609812340-98\n', '43690871234509875\n', '45\n', '\n'])
(3, ['BOBO:32498714235908713248\n', '0987235\n'])
Since you say you don't want physical files, the whole file must be able to fit in memory. In that case, to create file-like objects, use the cStringIO module:
由于您说您不需要物理文件,因此整个文件必须能够适合内存。在这种情况下,要创建类似文件的对象,请使用cStringIO模块:
import cStringIO
with open('a') as f:
file_handles=[]
for key,grp in itertools.groupby(f,bobo):
file_handles.append(cStringIO.StringIO(''.join(grp)))
file_handles
will be a list of file-like objects, one for each "BOBO:" stanza.
file_handles将是一个类似文件的对象列表,每个对应一个“BOBO:”节。
#2
1
If you can deal with keeping them in memory to work with them something like this probably works:
如果你可以处理将它们留在内存中与它们一起工作这样的东西可能有效:
subFileBlocks = []
with open('myReportFile.txt') as fh:
for line in fh:
if line.startswith('BOBO'):
subFileBlocks.append(line)
else:
subFileBlocks[-1] += line
At the end of that subFileBlocks should contain your sections as strings.
在subFileBlocks的末尾应该包含您的部分作为字符串。