I'm writing code to convert a CSV into XML. Suppose I have a single list like:
我正在编写代码将CSV转换为XML。假设我有一个列表,如:
input = ['name', 'val', 0, \
'name', 'val', 1, 'tag', 'val', \
'name', 'val', 2, 'tag', 'val', 'tag', 'val', \
'name', 'val', 0]
Every slice of this list starting with 'name' denotes an element with a name, a value, and a variable number of optional tag-value pairs.
此列表的每个切片以“name”开头,表示具有名称,值和可变标记值对的可变数量的元素。
I want to turn this into:
我想把它变成:
output = [['name', 'val', []],
['name', 'val', ['tag', 'val']],
['name', 'val', ['tag', 'val', 'tag', 'val']],
['name', 'val', []]]
No need to separate the tag-value pairs into tuples, this is handled in a separate method. I have a solution, but it's not very pythonic:
无需将标记值对分隔为元组,这是在单独的方法中处理的。我有一个解决方案,但它不是非常pythonic:
output=[]
cursor=0
while cursor < len(input):
name=input[cursor]
val=input[cursor+1]
ntags=int(input[cursor+2])
optslice=input[cursor+3:cursor+3+2*ntags]
cursor = cursor+3+2*ntags
print name, val, ntags, optslice, cursor
output.append([name, val, optslice])
print output
> name val 0 [] 3
> name val 1 ['tag', 'val'] 8
> name val 2 ['tag', 'val', 'tag', 'val'] 15
> name val 0 [] 18
> [['name', 'val', []], ['name', 'val', ['tag', 'val']], ['name', 'val', ['tag', 'val', 'tag', 'val']], ['name', 'val', []]]
I think I could probably do this as a list comprehension, but the variable length of each element is throwing me for a loop. The input is parsed from a CSV, and I can change the format to better fit a different solution. Ideas?
我想我可能会把它作为一个列表理解,但是每个元素的可变长度都会让我陷入循环。输入是从CSV解析的,我可以更改格式以更好地适应不同的解决方案。想法?
4 个解决方案
#1
6
I'd use an iterator instead of your cursor and then drive the comprehension with for name in it
.
我将使用迭代器而不是光标,然后使用名称驱动理解。
it = iter(input)
output = [[name, next(it), [next(it) for _ in range(2 * next(it))]] for name in it]
Or with islice
:
或者使用islice:
from itertools import islice
it = iter(input)
output = [[name, next(it), list(islice(it, 2 * next(it)))] for name in it]
That said, I suspect you shouldn't have all data in that flat list in the first place. Likely your CSV file has structure that you should use instead. I.e., don't flatten two-dimensional data just so you need to unflatten it back. But your question is interesting nonetheless :-)
也就是说,我怀疑你不应该首先在那个平面列表中包含所有数据。可能您的CSV文件具有您应该使用的结构。即,不要将二维数据展平,因此您需要将其展开。但你的问题很有意思:-)
#2
1
I don't know how pythonic you consider this, but you could do something like this
我不知道你认为pythonic是多么神圣,但你可以这样做
finallist = []
therest = x
while therest:
name, val, count, *therest = therest
sublist, therest = rest[:2*count], rest[2*count:]
finallist.append([name, val] + [sublist])
#3
0
Here is my code:
这是我的代码:
data = ['name', 'val', 0,
'name', 'val', 1, 'tag', 'val',
'name', 'val', 2, 'tag', 'val', 'tag', 'val',
'name', 'val', 0]
tmp = [
[
data[pos:pos + 2],
[i for i in data[pos + 3:pos + 3 + data[pos + 2] * 2]]
] for pos, e in enumerate(data) if e == 'name']
for e in tmp:
print e
The output is:
输出是:
# [['name', 'val'], []]
# [['name', 'val'], ['tag', 'val']]
# [['name', 'val'], ['tag', 'val', 'tag', 'val']]
# [['name', 'val'], []]
#4
0
If you really want to use pure list comprehension:
如果你真的想使用纯列表理解:
a = ['name', 'val', 0, \
'name', 'val', 1, 'tag', 'val', \
'name', 'val', 2, 'tag', 'val', 'tag', 'val', \
'name', 'val', 0]
print(
[grouped[:2] + [tag for tag in grouped[3:]] for grouped in
[
a[i:i+(a[i+1:].index("name") + 1 if a[i+1:].count("name") else len(a[i:])+1)]
for i, x in enumerate(a) if x == "name"
]
])
It is really ugly though.
这真的很难看。
#1
6
I'd use an iterator instead of your cursor and then drive the comprehension with for name in it
.
我将使用迭代器而不是光标,然后使用名称驱动理解。
it = iter(input)
output = [[name, next(it), [next(it) for _ in range(2 * next(it))]] for name in it]
Or with islice
:
或者使用islice:
from itertools import islice
it = iter(input)
output = [[name, next(it), list(islice(it, 2 * next(it)))] for name in it]
That said, I suspect you shouldn't have all data in that flat list in the first place. Likely your CSV file has structure that you should use instead. I.e., don't flatten two-dimensional data just so you need to unflatten it back. But your question is interesting nonetheless :-)
也就是说,我怀疑你不应该首先在那个平面列表中包含所有数据。可能您的CSV文件具有您应该使用的结构。即,不要将二维数据展平,因此您需要将其展开。但你的问题很有意思:-)
#2
1
I don't know how pythonic you consider this, but you could do something like this
我不知道你认为pythonic是多么神圣,但你可以这样做
finallist = []
therest = x
while therest:
name, val, count, *therest = therest
sublist, therest = rest[:2*count], rest[2*count:]
finallist.append([name, val] + [sublist])
#3
0
Here is my code:
这是我的代码:
data = ['name', 'val', 0,
'name', 'val', 1, 'tag', 'val',
'name', 'val', 2, 'tag', 'val', 'tag', 'val',
'name', 'val', 0]
tmp = [
[
data[pos:pos + 2],
[i for i in data[pos + 3:pos + 3 + data[pos + 2] * 2]]
] for pos, e in enumerate(data) if e == 'name']
for e in tmp:
print e
The output is:
输出是:
# [['name', 'val'], []]
# [['name', 'val'], ['tag', 'val']]
# [['name', 'val'], ['tag', 'val', 'tag', 'val']]
# [['name', 'val'], []]
#4
0
If you really want to use pure list comprehension:
如果你真的想使用纯列表理解:
a = ['name', 'val', 0, \
'name', 'val', 1, 'tag', 'val', \
'name', 'val', 2, 'tag', 'val', 'tag', 'val', \
'name', 'val', 0]
print(
[grouped[:2] + [tag for tag in grouped[3:]] for grouped in
[
a[i:i+(a[i+1:].index("name") + 1 if a[i+1:].count("name") else len(a[i:])+1)]
for i, x in enumerate(a) if x == "name"
]
])
It is really ugly though.
这真的很难看。