I want my python function to split a sentence (input) and store each word in a list. The code that I've written so far splits the sentence, but does not store the words as a list. How do I do that?
我希望我的python函数分割一个句子(输入)并将每个单词存储在一个列表中。到目前为止,我编写的代码分割了句子,但没有将单词存储为列表。我该怎么做呢?
def split_line(text): # split the text words = text.split() # for each word in the line: for word in words: # print the word print(word)
8 个解决方案
#1
357
text.split()
This should be enough to store each word in a list. words
is already a list of the words from the sentence, so there is no need for the loop.
这应该足够存储列表中的每个单词。单词已经是来自句子的单词列表,所以不需要循环。
Second, it might be a typo, but you have your loop a little messed up. If you really did want to use append, it would be:
第二,它可能是一个错误,但是您的循环有点混乱。如果您确实想要使用append,它应该是:
words.append(word)
not
不
word.append(words)
#2
345
Splits the string in text
on any consecutive runs of whitespace.
在任何连续运行的空格中分割字符串。
words = text.split()
Split the string in text
on delimiter: ","
.
在分隔符“,”上将字符串分割为文本。
words = text.split(",")
The words variable will be a list
and contain the words from text
split on the delimiter.
words变量将是一个列表,并包含来自分隔符上的文本拆分的单词。
#3
70
str.split()
Return a list of the words in the string, using sep as the delimiter ... If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
返回字符串中的单词列表,使用sep作为分隔符…如果没有指定sep或没有指定sep,则应用不同的分割算法:连续空格的运行被视为一个单独的分隔符,如果字符串有前导或后导空格,则结果将在开始或结束时不包含空字符串。
>>> line="a sentence with a few words">>> line.split()['a', 'sentence', 'with', 'a', 'few', 'words']>>>
#4
40
Depending on what you plan to do with your sentence-as-a-list, you may want to look at the Natural Language Took Kit. It deals heavily with text processing and evaluation. You can also use it to solve your problem:
根据你计划如何使用你的句子作为列表,你可能想看看自然语言采取工具包。它主要处理文本处理和评估。你也可以用它来解决你的问题:
import nltkwords = nltk.word_tokenize(raw_sentence)
This has the added benefit of splitting out punctuation.
这还有一个额外的好处,那就是把标点符号分开。
Example:
例子:
>>> import nltk>>> s = "The fox's foot grazed the sleeping dog, waking it.">>> words = nltk.word_tokenize(s)>>> words['The', 'fox', "'s", 'foot', 'grazed', 'the', 'sleeping', 'dog', ',', 'waking', 'it', '.']
This allows you to filter out any punctuation you don't want and use only words.
这允许你过滤掉你不想要的标点符号,只使用单词。
Please note that the other solutions using string.split()
are better if you don't plan on doing any complex manipulation of the sentance.
请注意,如果不打算对句子进行任何复杂的操作,那么使用string.split()的其他解决方案会更好。
#5
22
How about this algorithm? Split text on whitespace, then trim punctuation. This carefully removes punctuation from the edge of words, without harming apostrophes inside words such as we're
.
该算法呢?在空格上分割文本,然后修改标点符号。这将小心地从单词的边缘删除标点符号,而不会损害像我们这样的单词中的撇号。
>>> text"'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'">>> text.split()["'Oh,", 'you', "can't", 'help', "that,'", 'said', 'the', 'Cat:', "'we're", 'all', 'mad', 'here.', "I'm", 'mad.', "You're", "mad.'"]>>> import string>>> [word.strip(string.punctuation) for word in text.split()]['Oh', 'you', "can't", 'help', 'that', 'said', 'the', 'Cat', "we're", 'all', 'mad', 'here', "I'm", 'mad', "You're", 'mad']
#6
14
I want my python function to split a sentence (input) and store each word in a list
我希望我的python函数分割一个句子(输入)并将每个单词存储在一个列表中
The str().split()
method does this, it takes a string, splits it into a list:
方法str().split()可以做到这一点,它取一个字符串,将其分割成一个列表:
>>> the_string = "this is a sentence">>> words = the_string.split(" ")>>> print(words)['this', 'is', 'a', 'sentence']>>> type(words)<type 'list'> # or <class 'list'> in Python 3.0
The problem you're having is because of a typo, you wrote print(words)
instead of print(word)
:
你的问题是由于打印错误,你写了打印(单词)而不是打印(单词):
Renaming the word
variable to current_word
, this is what you had:
将单词变量重命名为current_word,如下所示:
def split_line(text): words = text.split() for current_word in words: print(words)
..when you should have done:
. .当你应该这样做的时候:
def split_line(text): words = text.split() for current_word in words: print(current_word)
If for some reason you want to manually construct a list in the for loop, you would use the list append()
method, perhaps because you want to lower-case all words (for example):
如果出于某种原因,您想要在for循环中手工构建一个列表,那么您将使用list append()方法,可能是因为您想要对所有单词进行小写化(例如):
my_list = [] # make empty listfor current_word in words: my_list.append(current_word.lower())
Or more a bit neater, using a list-comprehension:
或者更整洁一点,使用列表理解:
my_list = [current_word.lower() for current_word in words]
#7
#8
3
I think you are confused because of a typo.
我觉得你因为打印错误而困惑。
Replace print(words)
with print(word)
inside your loop to have every word printed on a different line
将打印(单词)替换为循环中的打印(单词),使每个单词都打印在不同的行上
#1
357
text.split()
This should be enough to store each word in a list. words
is already a list of the words from the sentence, so there is no need for the loop.
这应该足够存储列表中的每个单词。单词已经是来自句子的单词列表,所以不需要循环。
Second, it might be a typo, but you have your loop a little messed up. If you really did want to use append, it would be:
第二,它可能是一个错误,但是您的循环有点混乱。如果您确实想要使用append,它应该是:
words.append(word)
not
不
word.append(words)
#2
345
Splits the string in text
on any consecutive runs of whitespace.
在任何连续运行的空格中分割字符串。
words = text.split()
Split the string in text
on delimiter: ","
.
在分隔符“,”上将字符串分割为文本。
words = text.split(",")
The words variable will be a list
and contain the words from text
split on the delimiter.
words变量将是一个列表,并包含来自分隔符上的文本拆分的单词。
#3
70
str.split()
Return a list of the words in the string, using sep as the delimiter ... If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
返回字符串中的单词列表,使用sep作为分隔符…如果没有指定sep或没有指定sep,则应用不同的分割算法:连续空格的运行被视为一个单独的分隔符,如果字符串有前导或后导空格,则结果将在开始或结束时不包含空字符串。
>>> line="a sentence with a few words">>> line.split()['a', 'sentence', 'with', 'a', 'few', 'words']>>>
#4
40
Depending on what you plan to do with your sentence-as-a-list, you may want to look at the Natural Language Took Kit. It deals heavily with text processing and evaluation. You can also use it to solve your problem:
根据你计划如何使用你的句子作为列表,你可能想看看自然语言采取工具包。它主要处理文本处理和评估。你也可以用它来解决你的问题:
import nltkwords = nltk.word_tokenize(raw_sentence)
This has the added benefit of splitting out punctuation.
这还有一个额外的好处,那就是把标点符号分开。
Example:
例子:
>>> import nltk>>> s = "The fox's foot grazed the sleeping dog, waking it.">>> words = nltk.word_tokenize(s)>>> words['The', 'fox', "'s", 'foot', 'grazed', 'the', 'sleeping', 'dog', ',', 'waking', 'it', '.']
This allows you to filter out any punctuation you don't want and use only words.
这允许你过滤掉你不想要的标点符号,只使用单词。
Please note that the other solutions using string.split()
are better if you don't plan on doing any complex manipulation of the sentance.
请注意,如果不打算对句子进行任何复杂的操作,那么使用string.split()的其他解决方案会更好。
#5
22
How about this algorithm? Split text on whitespace, then trim punctuation. This carefully removes punctuation from the edge of words, without harming apostrophes inside words such as we're
.
该算法呢?在空格上分割文本,然后修改标点符号。这将小心地从单词的边缘删除标点符号,而不会损害像我们这样的单词中的撇号。
>>> text"'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'">>> text.split()["'Oh,", 'you', "can't", 'help', "that,'", 'said', 'the', 'Cat:', "'we're", 'all', 'mad', 'here.', "I'm", 'mad.', "You're", "mad.'"]>>> import string>>> [word.strip(string.punctuation) for word in text.split()]['Oh', 'you', "can't", 'help', 'that', 'said', 'the', 'Cat', "we're", 'all', 'mad', 'here', "I'm", 'mad', "You're", 'mad']
#6
14
I want my python function to split a sentence (input) and store each word in a list
我希望我的python函数分割一个句子(输入)并将每个单词存储在一个列表中
The str().split()
method does this, it takes a string, splits it into a list:
方法str().split()可以做到这一点,它取一个字符串,将其分割成一个列表:
>>> the_string = "this is a sentence">>> words = the_string.split(" ")>>> print(words)['this', 'is', 'a', 'sentence']>>> type(words)<type 'list'> # or <class 'list'> in Python 3.0
The problem you're having is because of a typo, you wrote print(words)
instead of print(word)
:
你的问题是由于打印错误,你写了打印(单词)而不是打印(单词):
Renaming the word
variable to current_word
, this is what you had:
将单词变量重命名为current_word,如下所示:
def split_line(text): words = text.split() for current_word in words: print(words)
..when you should have done:
. .当你应该这样做的时候:
def split_line(text): words = text.split() for current_word in words: print(current_word)
If for some reason you want to manually construct a list in the for loop, you would use the list append()
method, perhaps because you want to lower-case all words (for example):
如果出于某种原因,您想要在for循环中手工构建一个列表,那么您将使用list append()方法,可能是因为您想要对所有单词进行小写化(例如):
my_list = [] # make empty listfor current_word in words: my_list.append(current_word.lower())
Or more a bit neater, using a list-comprehension:
或者更整洁一点,使用列表理解:
my_list = [current_word.lower() for current_word in words]
#7
11
shlex has a .split()
function. It differs from str.split()
in that it does not preserve quotes and treats a quoted phrase as a single word:
shlex有.split()函数。它与str.split()的不同之处在于,它不保留引号,并将引用的短语视为单个单词:
>>> import shlex>>> shlex.split("sudo echo 'foo && bar'")['sudo', 'echo', 'foo && bar']
#8
3
I think you are confused because of a typo.
我觉得你因为打印错误而困惑。
Replace print(words)
with print(word)
inside your loop to have every word printed on a different line
将打印(单词)替换为循环中的打印(单词),使每个单词都打印在不同的行上