如何将unicode字符串拆分为列表[重复]

时间:2021-10-21 03:54:09

This question already has an answer here:

这个问题在这里已有答案:

I have the following code:

我有以下代码:

stru = "۰۱۲۳۴۵۶۷۸۹"
strlist = stru.decode("utf-8").split()
print strlist[0]

my output is :

我的输出是:

۰۱۲۳۴۵۶۷۸۹

But when i use:

但是当我使用时:

print strlist[1]

I get the following traceback:

我得到以下回溯:

IndexError: list index out of range

My question is, how can I split my string? Of course, remember I get my string from a function, consider it's a variable?

我的问题是,我怎么能分裂我的字符串?当然,记得我从函数中获取字符串,认为它是变量吗?

3 个解决方案

#1


9  

The split() method by default splits on whitespace. Therefore, strlist is a list that contains the whole string in strlist[0], and one single element.

默认情况下,split()方法在空格上分割。因此,strlist是一个列表,其中包含strlist [0]中的整个字符串和一个单独的元素。

If you want a list with one element for each unicode codepoint you can do transform it into a list in different ways:

如果你想要一个包含每个unicode代码点一个元素的列表,你可以用不同的方式将它转换为一个列表:

  • Function: list(stru.decode("utf-8"))
  • 功能:list(stru.decode(“utf-8”))
  • List comprension: [item for item in stru.decode("utf-8")]
  • 列表comprension:[stru.decode中项目的项目(“utf-8”)]
  • Not convert at all. Do you really need a list? You can iterate over the unicode string just like over any other sequence type (for character in stru.decode("utf-8"): ...)
  • 根本不转换。你真的需要一份清单吗?您可以像处理任何其他序列类型一样迭代unicode字符串(对于stru.decode(“utf-8”)中的字符:...)

#2


14  

  1. You don't need to.

    你不需要。

    >>> print u"۰۱۲۳۴۵۶۷۸۹"[1]
    ۱
    
  2. If you still want to...

    如果你还想......

    >>> list(u"۰۱۲۳۴۵۶۷۸۹")
    [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']
    

#3


6  

You can do this

你可以这样做

list(stru.decode("utf-8"))

#1


9  

The split() method by default splits on whitespace. Therefore, strlist is a list that contains the whole string in strlist[0], and one single element.

默认情况下,split()方法在空格上分割。因此,strlist是一个列表,其中包含strlist [0]中的整个字符串和一个单独的元素。

If you want a list with one element for each unicode codepoint you can do transform it into a list in different ways:

如果你想要一个包含每个unicode代码点一个元素的列表,你可以用不同的方式将它转换为一个列表:

  • Function: list(stru.decode("utf-8"))
  • 功能:list(stru.decode(“utf-8”))
  • List comprension: [item for item in stru.decode("utf-8")]
  • 列表comprension:[stru.decode中项目的项目(“utf-8”)]
  • Not convert at all. Do you really need a list? You can iterate over the unicode string just like over any other sequence type (for character in stru.decode("utf-8"): ...)
  • 根本不转换。你真的需要一份清单吗?您可以像处理任何其他序列类型一样迭代unicode字符串(对于stru.decode(“utf-8”)中的字符:...)

#2


14  

  1. You don't need to.

    你不需要。

    >>> print u"۰۱۲۳۴۵۶۷۸۹"[1]
    ۱
    
  2. If you still want to...

    如果你还想......

    >>> list(u"۰۱۲۳۴۵۶۷۸۹")
    [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']
    

#3


6  

You can do this

你可以这样做

list(stru.decode("utf-8"))