My data looks like:
我的数据如下:
1 1.45
1 1.153
2 2.179
2 2.206
2 2.59
2 2.111
3 3.201
3 3.175
4 4.228
4 4.161
4 4.213
The output I want is :
我想要的输出是:
1 2 (1 occurs 2 times)
2 4
3 2
4 3
For this I run the following code:
为此,我运行以下代码:
SubPatent2count = {}
for line in data.split('\n'):
for num in line.split('\t'):
Mapper_data = ["%s\t%d" % (num[0], 1) ]
for line in Mapper_data:
Sub_Patent,count = line.strip().split('\t',1)
try:
count = int(count)
except ValueError:
continue
try:
SubPatent2count[Sub_Patent] = SubPatent2count[Sub_Patent]+count
except:
SubPatent2count[Sub_Patent] = count
for Sub_Patent in SubPatent2count.keys():
print ('%s\t%s'% ( Sub_Patent, SubPatent2count[Sub_Patent] ))
At the end I get this error :
最后我得到这个错误:
3 for num in line.split('\t'):
4 #print(num[0])
----> 5 Mapper_data = ["%s\t%d" % (num[0], 1) ]
6 #print(Mapper_data)
7 for line in Mapper_data:
IndexError: string index out of range
If you have any Idea how I can deal with this error please Help. Thank you!
如果您有任何想法我如何处理此错误请帮助。谢谢!
3 个解决方案
#1
0
Just suggesting another approach: Have you tried with list comprehension + groupy from itertools
?
只是建议另一种方法:你是否尝试过使用itertools的list comprehension + groupy?
from itertools import groupby
print([(key, len(list(group))) for key, group in groupby([x.split(' ')[0] for x in data.split('\n')])])
# where [x.split(' ')[0] for x in data.split('\n')] generates a list of all starting number
# and groupy counts them
Or if you want that exact output:
或者如果你想要那个确切的输出:
from itertools import groupby
mylist = [(key, len(list(group))) for key, group in groupby([x.split(' ')[0] for x in data.split('\n')])]
for key, repetition in mylist:
print(key, repetition)
#2
0
Thank you everybody, your suggestions really helped me, I changed my code as follow:
谢谢大家,你的建议真的帮助了我,我改变了我的代码如下:
SubPatent2count = {}
for line in data.split('\n'):
Mapper_data = ["%s\o%d" % (line.split(' ')[0], 1) ]
for line in Mapper_data:
Sub_Patent,count = line.strip().split('\o',1)
try:
count = int(count)
except ValueError:
continue
try:
SubPatent2count[Sub_Patent] = SubPatent2count[Sub_Patent]+count
except:
SubPatent2count[Sub_Patent] = count
for Sub_Patent in SubPatent2count.keys():
print ('%s\t%s'% ( Sub_Patent, SubPatent2count[Sub_Patent] ))
And it gives the following result:
它给出了以下结果:
1 2 (1 occurs 2 times)
2 4
3 2
4 3
#3
0
num[0]
is probably an empty string, that's why you are getting an index out of range error. Another possibility is that you are in fact separating the number in each line with empty strings, not with tabs.
num [0]可能是一个空字符串,这就是为什么你得到索引超出范围的错误。另一种可能性是,您实际上是将每行中的数字与空字符串分开,而不是使用制表符。
Anyway, your code seems a little strange. For example, you encode the data in a string in a list of one element (Mapped_data
) and then decode it to process it. That is really not necessary and you should avoid it.
无论如何,你的代码似乎有点奇怪。例如,您将数据编码为一个元素(Mapped_data)列表中的字符串,然后对其进行解码以对其进行处理。这真的没有必要,你应该避免它。
Try this code:
试试这段代码:
from collections import Counter
decoded_data = [ int(l.split(' ', 1)[0]) for l in data.split('\n') if len(l)>0]
SubPatent2count = Counter(decoded_data)
for k in SubPatent2count:
print k, SubPatent2count[k]
#1
0
Just suggesting another approach: Have you tried with list comprehension + groupy from itertools
?
只是建议另一种方法:你是否尝试过使用itertools的list comprehension + groupy?
from itertools import groupby
print([(key, len(list(group))) for key, group in groupby([x.split(' ')[0] for x in data.split('\n')])])
# where [x.split(' ')[0] for x in data.split('\n')] generates a list of all starting number
# and groupy counts them
Or if you want that exact output:
或者如果你想要那个确切的输出:
from itertools import groupby
mylist = [(key, len(list(group))) for key, group in groupby([x.split(' ')[0] for x in data.split('\n')])]
for key, repetition in mylist:
print(key, repetition)
#2
0
Thank you everybody, your suggestions really helped me, I changed my code as follow:
谢谢大家,你的建议真的帮助了我,我改变了我的代码如下:
SubPatent2count = {}
for line in data.split('\n'):
Mapper_data = ["%s\o%d" % (line.split(' ')[0], 1) ]
for line in Mapper_data:
Sub_Patent,count = line.strip().split('\o',1)
try:
count = int(count)
except ValueError:
continue
try:
SubPatent2count[Sub_Patent] = SubPatent2count[Sub_Patent]+count
except:
SubPatent2count[Sub_Patent] = count
for Sub_Patent in SubPatent2count.keys():
print ('%s\t%s'% ( Sub_Patent, SubPatent2count[Sub_Patent] ))
And it gives the following result:
它给出了以下结果:
1 2 (1 occurs 2 times)
2 4
3 2
4 3
#3
0
num[0]
is probably an empty string, that's why you are getting an index out of range error. Another possibility is that you are in fact separating the number in each line with empty strings, not with tabs.
num [0]可能是一个空字符串,这就是为什么你得到索引超出范围的错误。另一种可能性是,您实际上是将每行中的数字与空字符串分开,而不是使用制表符。
Anyway, your code seems a little strange. For example, you encode the data in a string in a list of one element (Mapped_data
) and then decode it to process it. That is really not necessary and you should avoid it.
无论如何,你的代码似乎有点奇怪。例如,您将数据编码为一个元素(Mapped_data)列表中的字符串,然后对其进行解码以对其进行处理。这真的没有必要,你应该避免它。
Try this code:
试试这段代码:
from collections import Counter
decoded_data = [ int(l.split(' ', 1)[0]) for l in data.split('\n') if len(l)>0]
SubPatent2count = Counter(decoded_data)
for k in SubPatent2count:
print k, SubPatent2count[k]