Assume I have a list of this type:
假设我有这个类型的列表:
# 0 1 2 3 4 5 6 7 8 9 10 11 -- list index
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]
I want to find each index for which the value is the same for the n
following indices.
我想要找出每个指标的值对于n以下的指标是相同的。
I can do it (laboriously) this way:
我可以(辛苦地)这样做:
def sub_seq(li,n):
ans={}
for x in set(li):
ans[x]=[i for i,e in enumerate(li[:-n+1]) if all(x==y for y in li[i:i+n])]
ans={k:v for k,v in ans.items() if v}
return ans
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1]
for i in (5,4,3,2):
print i, sub_seq(li,i)
Prints:
打印:
5 {1: [5]}
4 {1: [5, 6]}
3 {1: [5, 6, 7]}
2 {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]}
Is there a better way to do this?
有没有更好的办法?
3 个解决方案
#1
5
Analyzing data is typically easier if you first convert it to a convenient form. In this case, a run-length-encoding would be a good starting point:
如果您首先将数据转换为方便的形式,那么分析数据通常比较容易。在这种情况下,长度编码将是一个很好的起点:
from itertools import groupby, accumulate
from collections import defaultdict
def sub_seq(li, n):
d = defaultdict(list)
rle = [(k, len(list(g))) for k, g in groupby(li)]
endpoints = accumulate(size for k, size in rle)
for end_index, (value, count) in zip(endpoints, rle):
for index in range(end_index - count, end_index - n + 1):
d[value].append(index)
return dict(d)
#2
1
As Raymond Hettinger points out in his answer, groupby
makes easier to check consecutive values. If you also enumerate the list, you can keep the corresponding indices and add them to the dictionary (I use defaultdict
to make the function as short as possible):
正如雷蒙德•赫廷格(Raymond Hettinger)在他的回答中指出的,groupby更容易检查连续的值。如果你也列举了列表,你可以保留相应的索引并将它们添加到字典中(我使用defaultdict de来使函数尽可能的短):
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
li = [-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1]
def sub_seq(li, n):
res = defaultdict(list)
for k, g in groupby(enumerate(li), itemgetter(1)):
l = list(map(itemgetter(0), g))
if n <= len(l): res[k] += l[0:len(l)-n+1]
return res
for i in (5,4,3,2):
print i, sub_seq(li,i)
Which prints:
打印:
5 defaultdict(<type 'list'>, {1: [5]})
4 defaultdict(<type 'list'>, {1: [5, 6]})
3 defaultdict(<type 'list'>, {1: [5, 6, 7]})
2 defaultdict(<type 'list'>, {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]})
#3
0
I personally think that this is a bit more readable, constructs less objects and I would guess runs faster.
我个人认为这更容易读,构造更少的对象,我猜运行得更快。
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]
results = []
i = 0
while i < len(li):
j = i + 1
while j < len(li) and li[i] == li[j]:
j += 1
results.append((i,li[i],j-i))
i = j
print results #[(0, -1, 2), (2, 2, 2), (4, -1, 1), (5, 1, 5), (10, -1, 2)]
#1
5
Analyzing data is typically easier if you first convert it to a convenient form. In this case, a run-length-encoding would be a good starting point:
如果您首先将数据转换为方便的形式,那么分析数据通常比较容易。在这种情况下,长度编码将是一个很好的起点:
from itertools import groupby, accumulate
from collections import defaultdict
def sub_seq(li, n):
d = defaultdict(list)
rle = [(k, len(list(g))) for k, g in groupby(li)]
endpoints = accumulate(size for k, size in rle)
for end_index, (value, count) in zip(endpoints, rle):
for index in range(end_index - count, end_index - n + 1):
d[value].append(index)
return dict(d)
#2
1
As Raymond Hettinger points out in his answer, groupby
makes easier to check consecutive values. If you also enumerate the list, you can keep the corresponding indices and add them to the dictionary (I use defaultdict
to make the function as short as possible):
正如雷蒙德•赫廷格(Raymond Hettinger)在他的回答中指出的,groupby更容易检查连续的值。如果你也列举了列表,你可以保留相应的索引并将它们添加到字典中(我使用defaultdict de来使函数尽可能的短):
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
li = [-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1]
def sub_seq(li, n):
res = defaultdict(list)
for k, g in groupby(enumerate(li), itemgetter(1)):
l = list(map(itemgetter(0), g))
if n <= len(l): res[k] += l[0:len(l)-n+1]
return res
for i in (5,4,3,2):
print i, sub_seq(li,i)
Which prints:
打印:
5 defaultdict(<type 'list'>, {1: [5]})
4 defaultdict(<type 'list'>, {1: [5, 6]})
3 defaultdict(<type 'list'>, {1: [5, 6, 7]})
2 defaultdict(<type 'list'>, {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]})
#3
0
I personally think that this is a bit more readable, constructs less objects and I would guess runs faster.
我个人认为这更容易读,构造更少的对象,我猜运行得更快。
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]
results = []
i = 0
while i < len(li):
j = i + 1
while j < len(li) and li[i] == li[j]:
j += 1
results.append((i,li[i],j-i))
i = j
print results #[(0, -1, 2), (2, 2, 2), (4, -1, 1), (5, 1, 5), (10, -1, 2)]