I have a list of strings:
我有一个字符串列表:
['splitter001','stringA','stringB','splitter_1234','stringC']
and I want my end result to be:
我希望我的最终结果是:
[ ['splitter001','stringA','stringB'] , ['splitter_1234','stringC'] ]
The splitter dividers are not identical strings.
分路器分频器不是相同的串。
I've tried to .find the 'splitter' if the element index > 0, and then delete the indexes [:2nd splitter] and append the first group into a new list, but this doesn't work properly.
我试图找到'splitter'如果元素索引> 0,然后删除索引[:2nd splitter]并将第一个组追加到一个新列表中,但这不能正常工作。
I am iterating a for loop over all the strings and it doesn't work for the second group so I can get:
我正在迭代所有字符串的for循环,它不适用于第二组,所以我可以得到:
[ ['splitter001','stringA','stringB'] ] as my new list, but the second group is missing.
I've read many answers on this topic and the closest solution was to use:
我已经阅读了很多关于这个主题的答案,最接近的解决方案是使用:
[list(x[1]) for x in itertools.groupby(myList, lambda x: x=='#') if not x[0]]
but I do not understand this syntax... I've read on groupby and intertools but I'm not sure this is helpful for my situations.
但我不明白这种语法...我已经阅读过groupby和intertools,但我不确定这对我的情况是否有帮助。
4 个解决方案
#1
3
Here's one way to do this with groupby
. We tell groupby
to look for strings that start with 'splitter'. This creates two kinds of groups: strings that start with 'splitter', and all the other strings. Eg,
这是使用groupby执行此操作的一种方法。我们告诉groupby寻找以'splitter'开头的字符串。这会创建两种组:以“splitter”开头的字符串和所有其他字符串。例如,
from itertools import groupby
data = ['splitter001','stringA','stringB','splitter_1234','stringC']
for k, g in groupby(data, key=lambda s: s.startswith('splitter')):
print(k, list(g))
output
产量
True ['splitter001']
False ['stringA', 'stringB']
True ['splitter_1234']
False ['stringC']
So we can put those groups into two lists and then zip them together to make the final list.
因此,我们可以将这些组放入两个列表中,然后将它们压缩在一起以制作最终列表。
from itertools import groupby
data = ['splitter001','stringA','stringB','splitter_1234','stringC']
head = []
tail = []
for k, g in groupby(data, key=lambda s: s.startswith('splitter')):
if k:
head.append(list(g))
else:
tail.append(list(g))
out = [u+v for u, v in zip(head, tail)]
print(out)
output
产量
[['splitter001', 'stringA', 'stringB'], ['splitter_1234', 'stringC']]
Here's a more compact way to do the same thing, using a list of lists to store the head and tail lists:
这是一个更紧凑的方式来做同样的事情,使用列表列表来存储头部和尾部列表:
from itertools import groupby
data = ['splitter001','stringA','stringB','splitter_1234','stringC']
results = [[], []]
for k, g in groupby(data, key=lambda s: s.startswith('splitter')):
results[k].append(list(g))
out = [v+u for u, v in zip(*results)]
print(out)
output
产量
[['splitter001', 'stringA', 'stringB'], ['splitter_1234', 'stringC']]
If you want to print each sublist on a separate line, the simple way is to do it with a for
loop instead of creating the out
list.
如果要在单独的行上打印每个子列表,简单的方法是使用for循环而不是创建out列表。
for u, v in zip(*results):
print(v + u)
output
产量
['splitter001', 'stringA', 'stringB']
['splitter_1234', 'stringC']
Another way is to convert the sublists to strings and then join them together with newlines to create one big string.
另一种方法是将子列表转换为字符串,然后将它们与换行符连接在一起以创建一个大字符串。
print('\n'.join([str(v + u) for u, v in zip(*results)]))
This final variation stores both kinds of groups into a single iterator object. I think you'll agree that the previous versions are easier to read. :)
此最终变体将两种类存储到单个迭代器对象中。我想你会同意以前的版本更容易阅读。 :)
it = iter(list(g) for k, g in groupby(data, key=lambda s: s.startswith('splitter')))
out = [u+v for u, v in zip(it, it)]
#2
2
get indices of startswith('splitter')
elements, then slice the list at those indices
获取startswith('splitter')元素的索引,然后在这些索引处对列表进行切片
sl = ['splitter001','stringA','stringB','splitter_1234','stringC']
si = [i for i, e in enumerate(sl) if e.startswith('splitter')]
[sl[i:j] for i, j in zip(si, si[1:] + [len(sl)])]
Out[66]: [['splitter001', 'stringA', 'stringB'], ['splitter_1234', 'stringC']]
#3
1
Here is an approach using a for
loop, as you mentioned you tried, that handles the case of the second group:
这是一种使用for循环的方法,如您所提到的,它处理第二组的情况:
# define list of strings for input
strings = ['splitter001','stringA','stringB','splitter_1234','stringC']
split_strings = [] # this is going to hold the final output
current_list = [] # this is a temporary list
# loop over strings in the input
for s in strings:
if 'splitter' in s:
# if current_list is not empty
if current_list:
split_strings.append(current_list) # append to output
current_list = [] # reset current_list
current_list.append(s)
# outside of the loop, append the leftover strings (if any)
if current_list:
split_strings.append(current_list)
The key here is that you do one more append at the end, outside of your loop, to capture the last group.
这里的关键是你在循环之外的末尾添加一个,以捕获最后一个组。
Output:
输出:
[['splitter001', 'stringA', 'stringB'], ['splitter_1234', 'stringC']]
EDIT: Adding explanation of code.
编辑:添加代码说明。
We create a temp variable current_list
to hold each list that we will append to the final output split_strings
.
我们创建一个临时变量current_list来保存我们将附加到最终输出split_strings的每个列表。
Loop over the strings in the input. For each string s
, check if it contains 'splitter'
. If it does AND the current_list
is not empty, this means that we've hit the next delimiter. Append current_list
to the output and clear it out so we can begin collecting items for the next set of strings.
循环输入中的字符串。对于每个字符串s,检查它是否包含“拆分器”。如果它确实并且current_list不为空,这意味着我们已经命中了下一个分隔符。将current_list附加到输出并清除它,以便我们可以开始收集下一组字符串的项目。
After this check, append the current string to current_list
. This works because we cleared it out (setting it equal to []
) after we found a delimiter.
检查完成后,将当前字符串追加到current_list。这是有效的,因为我们在找到分隔符后将其清除(将其设置为等于[])。
At the end of the list, we append whatever is leftover to the output, if anything.
在列表的末尾,如果有的话,我们会将剩余的东西附加到输出中。
#4
1
You can try something like this :
你可以尝试这样的事情:
first get the from to
index numbers when splitter
appeared then just chuck the list according to those index:
当splitter出现时,首先得到from到索引号,然后根据这些索引查看列表:
sl = ['splitter001','stringA','stringB','splitter_1234','stringC']
si = [index for index, value in enumerate(sl) if value.startswith('splitter')]
for i in range(0,len(si),1):
slice=si[i:i+2]
if len(slice)==2:
print(sl[slice[0]:slice[1]])
else:
print(sl[slice[0]:])
output:
输出:
['splitter001', 'stringA', 'stringB']
['splitter_1234', 'stringC']
#1
3
Here's one way to do this with groupby
. We tell groupby
to look for strings that start with 'splitter'. This creates two kinds of groups: strings that start with 'splitter', and all the other strings. Eg,
这是使用groupby执行此操作的一种方法。我们告诉groupby寻找以'splitter'开头的字符串。这会创建两种组:以“splitter”开头的字符串和所有其他字符串。例如,
from itertools import groupby
data = ['splitter001','stringA','stringB','splitter_1234','stringC']
for k, g in groupby(data, key=lambda s: s.startswith('splitter')):
print(k, list(g))
output
产量
True ['splitter001']
False ['stringA', 'stringB']
True ['splitter_1234']
False ['stringC']
So we can put those groups into two lists and then zip them together to make the final list.
因此,我们可以将这些组放入两个列表中,然后将它们压缩在一起以制作最终列表。
from itertools import groupby
data = ['splitter001','stringA','stringB','splitter_1234','stringC']
head = []
tail = []
for k, g in groupby(data, key=lambda s: s.startswith('splitter')):
if k:
head.append(list(g))
else:
tail.append(list(g))
out = [u+v for u, v in zip(head, tail)]
print(out)
output
产量
[['splitter001', 'stringA', 'stringB'], ['splitter_1234', 'stringC']]
Here's a more compact way to do the same thing, using a list of lists to store the head and tail lists:
这是一个更紧凑的方式来做同样的事情,使用列表列表来存储头部和尾部列表:
from itertools import groupby
data = ['splitter001','stringA','stringB','splitter_1234','stringC']
results = [[], []]
for k, g in groupby(data, key=lambda s: s.startswith('splitter')):
results[k].append(list(g))
out = [v+u for u, v in zip(*results)]
print(out)
output
产量
[['splitter001', 'stringA', 'stringB'], ['splitter_1234', 'stringC']]
If you want to print each sublist on a separate line, the simple way is to do it with a for
loop instead of creating the out
list.
如果要在单独的行上打印每个子列表,简单的方法是使用for循环而不是创建out列表。
for u, v in zip(*results):
print(v + u)
output
产量
['splitter001', 'stringA', 'stringB']
['splitter_1234', 'stringC']
Another way is to convert the sublists to strings and then join them together with newlines to create one big string.
另一种方法是将子列表转换为字符串,然后将它们与换行符连接在一起以创建一个大字符串。
print('\n'.join([str(v + u) for u, v in zip(*results)]))
This final variation stores both kinds of groups into a single iterator object. I think you'll agree that the previous versions are easier to read. :)
此最终变体将两种类存储到单个迭代器对象中。我想你会同意以前的版本更容易阅读。 :)
it = iter(list(g) for k, g in groupby(data, key=lambda s: s.startswith('splitter')))
out = [u+v for u, v in zip(it, it)]
#2
2
get indices of startswith('splitter')
elements, then slice the list at those indices
获取startswith('splitter')元素的索引,然后在这些索引处对列表进行切片
sl = ['splitter001','stringA','stringB','splitter_1234','stringC']
si = [i for i, e in enumerate(sl) if e.startswith('splitter')]
[sl[i:j] for i, j in zip(si, si[1:] + [len(sl)])]
Out[66]: [['splitter001', 'stringA', 'stringB'], ['splitter_1234', 'stringC']]
#3
1
Here is an approach using a for
loop, as you mentioned you tried, that handles the case of the second group:
这是一种使用for循环的方法,如您所提到的,它处理第二组的情况:
# define list of strings for input
strings = ['splitter001','stringA','stringB','splitter_1234','stringC']
split_strings = [] # this is going to hold the final output
current_list = [] # this is a temporary list
# loop over strings in the input
for s in strings:
if 'splitter' in s:
# if current_list is not empty
if current_list:
split_strings.append(current_list) # append to output
current_list = [] # reset current_list
current_list.append(s)
# outside of the loop, append the leftover strings (if any)
if current_list:
split_strings.append(current_list)
The key here is that you do one more append at the end, outside of your loop, to capture the last group.
这里的关键是你在循环之外的末尾添加一个,以捕获最后一个组。
Output:
输出:
[['splitter001', 'stringA', 'stringB'], ['splitter_1234', 'stringC']]
EDIT: Adding explanation of code.
编辑:添加代码说明。
We create a temp variable current_list
to hold each list that we will append to the final output split_strings
.
我们创建一个临时变量current_list来保存我们将附加到最终输出split_strings的每个列表。
Loop over the strings in the input. For each string s
, check if it contains 'splitter'
. If it does AND the current_list
is not empty, this means that we've hit the next delimiter. Append current_list
to the output and clear it out so we can begin collecting items for the next set of strings.
循环输入中的字符串。对于每个字符串s,检查它是否包含“拆分器”。如果它确实并且current_list不为空,这意味着我们已经命中了下一个分隔符。将current_list附加到输出并清除它,以便我们可以开始收集下一组字符串的项目。
After this check, append the current string to current_list
. This works because we cleared it out (setting it equal to []
) after we found a delimiter.
检查完成后,将当前字符串追加到current_list。这是有效的,因为我们在找到分隔符后将其清除(将其设置为等于[])。
At the end of the list, we append whatever is leftover to the output, if anything.
在列表的末尾,如果有的话,我们会将剩余的东西附加到输出中。
#4
1
You can try something like this :
你可以尝试这样的事情:
first get the from to
index numbers when splitter
appeared then just chuck the list according to those index:
当splitter出现时,首先得到from到索引号,然后根据这些索引查看列表:
sl = ['splitter001','stringA','stringB','splitter_1234','stringC']
si = [index for index, value in enumerate(sl) if value.startswith('splitter')]
for i in range(0,len(si),1):
slice=si[i:i+2]
if len(slice)==2:
print(sl[slice[0]:slice[1]])
else:
print(sl[slice[0]:])
output:
输出:
['splitter001', 'stringA', 'stringB']
['splitter_1234', 'stringC']