I begin with a list of words like ["ONE","TWO","THREE","FOUR"]
.
我从一个单词列表开始,如[“ONE”,“TWO”,“THREE”,“FOUR”]。
Later, I join the list to make a string: "ONETWOTHREEFOUR"
. I do some stuff while looking at this string and get a list of indices, say [6,7,8,0,4]
(which maps onto that string to give me the word "THROW", though as pointed out in comments that's irrelevant to my question).
后来,我加入列表来创建一个字符串:“ONETWOTHREEFOUR”。我在查看这个字符串时会做一些事情并得到一个索引列表,比如[6,7,8,0,4](它映射到那个字符串上给我“THROW”这个词,尽管正如评论中指出的那样)与我的问题无关)。
Now I want to know which items from the original list gave me the letters I am using to make my word. I know I used letters [6,7,8,0,4]
from the joined string. Based on that list of string indices, I want the output {0,1,2}
, because I used letters from every word in the original list except "FOUR"
.
现在我想知道原始列表中的哪些项目给了我用来表达我的信件。我知道我使用了连接字符串中的字母[6,7,8,0,4]。根据字符串索引列表,我想要输出{0,1,2},因为我使用了原始列表中除“FOUR”之外的每个单词的字母。
What I've tried so far:
到目前为止我尝试了什么:
wordlist = ["ONE","TWO","THREE","FOUR"]
stringpositions = [6,7,8,0,4]
wordlengths = tuple(len(w) for w in wordlist) #->(3, 3, 5, 4)
wordstarts = tuple(sum(wordlengths[:i]) for i in range(len(wordlengths))) #->(0, 3, 6, 11)
words_used = set()
for pos in stringpositions:
prev = 0
for wordnumber,wordstart in enumerate(wordstarts):
if pos < wordstart:
words_used.add(prev)
break
prev = wordnumber
It seems awfully long-winded. What's the best (and/or most Pythonic) way for me to do this?
看起来非常啰嗦。对我来说,最好的(和/或大多数Pythonic)方法是什么?
2 个解决方案
#1
1
As clarified in the comments, the OP's goal is to figure out which words were used based on which string positions were used, rather than which letters were used -- so the word/substring THROW
is basically irrelevant.
正如评论中所阐明的那样,OP的目标是根据使用的字符串位置找出使用哪些单词,而不是使用哪些字母 - 所以字/子串THROW基本上是无关紧要的。
Here's a very short version:
这是一个很短的版本:
from itertools import chain
wordlist = ["ONE","TWO","THREE","FOUR"]
string = ''.join(wordlist) # "ONETWOTHREEFOUR"
stringpositions = [6,7,8,0,4]
# construct a list that maps every position in string to a single source word
which_word = list(chain( [ii]*len(w) for ii, w in enumerate(wordlist) ))
# it's now trivial to use which_word to construct the set of words
# represented in the list stringpositions
words_used = set( which_word[pos] for pos in stringpositions )
print "which_word=", which_word
print "words_used=", words_used
==>
which_word= [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3]
words_used= set([0, 1, 2])
EDIT: Updated to use list(itertools.chain(generator))
rather than sum(generator, [])
as suggested by @inspectorG4dget in the comments.
编辑:更新为使用列表(itertools.chain(生成器))而不是sum(generator,[]),如@ inspectorG4dget在评论中所建议的那样。
#2
1
Here's the easiest way. If you want to be more space-efficient, you might want to use some sort of binary search tree
这是最简单的方法。如果您想要更节省空间,可能需要使用某种二叉搜索树
wordlist = ["ONE","TWO","THREE","FOUR"]
top = 0
inds = {}
for i,word in enumerate(wordlist):
for k in range(top, top+len(word)):
inds[k] = i
top += len(word)
#do some magic
L = [6,7,8,0,4]
for i in L: print(inds[i])
Output:
2
2
2
0
1
You could of course call set()
on the output if you wanted to
如果你愿意,你当然可以在输出上调用set()
#1
1
As clarified in the comments, the OP's goal is to figure out which words were used based on which string positions were used, rather than which letters were used -- so the word/substring THROW
is basically irrelevant.
正如评论中所阐明的那样,OP的目标是根据使用的字符串位置找出使用哪些单词,而不是使用哪些字母 - 所以字/子串THROW基本上是无关紧要的。
Here's a very short version:
这是一个很短的版本:
from itertools import chain
wordlist = ["ONE","TWO","THREE","FOUR"]
string = ''.join(wordlist) # "ONETWOTHREEFOUR"
stringpositions = [6,7,8,0,4]
# construct a list that maps every position in string to a single source word
which_word = list(chain( [ii]*len(w) for ii, w in enumerate(wordlist) ))
# it's now trivial to use which_word to construct the set of words
# represented in the list stringpositions
words_used = set( which_word[pos] for pos in stringpositions )
print "which_word=", which_word
print "words_used=", words_used
==>
which_word= [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3]
words_used= set([0, 1, 2])
EDIT: Updated to use list(itertools.chain(generator))
rather than sum(generator, [])
as suggested by @inspectorG4dget in the comments.
编辑:更新为使用列表(itertools.chain(生成器))而不是sum(generator,[]),如@ inspectorG4dget在评论中所建议的那样。
#2
1
Here's the easiest way. If you want to be more space-efficient, you might want to use some sort of binary search tree
这是最简单的方法。如果您想要更节省空间,可能需要使用某种二叉搜索树
wordlist = ["ONE","TWO","THREE","FOUR"]
top = 0
inds = {}
for i,word in enumerate(wordlist):
for k in range(top, top+len(word)):
inds[k] = i
top += len(word)
#do some magic
L = [6,7,8,0,4]
for i in L: print(inds[i])
Output:
2
2
2
0
1
You could of course call set()
on the output if you wanted to
如果你愿意,你当然可以在输出上调用set()