从tiddlywiki列表传递到python列表

Tiddlywiki uses internally a space-separated tags for making a list of tags. But it uses [[ and ]] to limit multi-word tags. That is, a list of foo, ram doo, bar and very cool becomes in tiddlywiki a string like that:

Tiddlywiki在内部使用空格分隔的标签来制作标签列表。但它使用[[和]]来限制多字标签。也就是说,foo,ram doo,bar和非常酷的列表在tiddlywiki中变成了这样的字符串:

"foo [[ram doo]] bar [[very cool]]"

How can I transform that into python list that look like:

我怎样才能将其转换为如下所示的python列表:

['foo', 'ram doo', 'bar', 'very cool']

"foo [[ram doo]] bar".split() does not work for me..

“foo [[ram doo]] bar”.split()对我不起作用..

4 个解决方案

#1

With regex:

import re
a = "foo [[ram doo]] bar [[very cool]] something else"
pattern = re.compile(r'\[\[[^\]]+\]\]|[^\[\] ]+')
print([i.strip(' []') for i in pattern.findall(a)])

Prints ['foo', 'ram doo', 'bar', 'very cool', 'something', 'else']

打印['foo','ram doo','bar','非常酷','某事','其他']

Regex basically "tokenizes" the string (borders are either [[..]] or space, in that order), the list comprehension then removes the brackets from the tokens.

正则表达式基本上“标记”字符串(边框是[[..]]或空格,按顺序),列表推导然后从标记中删除括号。

#2

A simple regular expression works:

一个简单的正则表达式:

>>> import re
>>> [x.strip() for x in re.split('\[\[|\]\]',  "foo [[ram doo]] bar [[very cool]]") if x]
['foo', 'ram doo', 'bar', 'very cool']

#3

This will work fine. Two line code, without Regular expression:

这样可以正常工作。两行代码,没有正则表达式:

>>> s =  "foo [[ram doo]] bar [[very cool]]"
>>> [x.strip() for x in " ".join(s.replace('[[','*').replace(']]','*').split("*")).split(" ") if x]
['foo', 'ram', 'doo', 'bar', 'very', 'cool']

#4

You can do it the following way, without using re

您可以通过以下方式执行此操作,而无需使用re

Obviously using re would be more efficient, this answer only to demonstrate that you can do this with split()

显然使用re会更有效,这个答案只是为了证明你可以用split()做到这一点

[EDITED based on comment]

[根据评论编辑]

my_string = "foo [[ram doo]] bar [[very cool]]"

# also works for the following strings
#my_string = "foo [[ram doo]] bar [[very cool]] something else"
#my_string = "something else"
#my_string = "foo bar [[ram doo]]" ##<-- this is the border case
#my_string = "[[ram doo]] foo bar"
#my_string = "foo [[ram doo]] bar "

# set "splitting string"
s1 = ']]'
s2 = '[['
if my_string[-2::] == ']]' and my_string.count(']]') == 1:
    # reverse splitting string for border case
    s1 = '[['
    s2 = ']]'

# split on s1 only if s1 in string
my_list1 = [a if s1 in my_string else my_string for a in my_string.split(s1)]

# split each element on s2 or space
my_list2 = [x.split(s2) if s2 in x else x.split(' ') for x in my_list1]

# flatten lists in lists, and strip spaces
my_list3 = [a.strip(' ') for b in my_list2 for a in (b if isinstance(b, list) else [b])]

# get rid of empties
my_list4 = [a for a in my_list3 if a != '']
print(my_list4)
# will output
# ['foo', 'ram doo', 'bar', 'very cool']

so, the conclusion is to use re

所以,结论是使用re

#1