numpy的花式索引如何工作?

时间:2022-06-16 21:26:21

I was doing a little experimentation with 2D lists and numpy arrays. From this, I've raised 3 questions I'm quite curious to know the answer for.

我正在对2D列表和numpy数组进行一些实验。由此,我提出了3个问题,我很想知道答案。

First, I initialized a 2D python list.

首先,我初始化了一个2D python列表。

>>> my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

I then tried indexing the list with a tuple.

然后我尝试使用元组索引列表。

>>> my_list[:,]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple

Since the interpreter throws me a TypeError and not a SyntaxError, I surmised it is actually possible to do this, but python does not natively support it.

由于解释器抛出了一个TypeError而不是一个SyntaxError,我猜测它实际上可以这样做,但是python本身并不支持它。

I then tried converting the list to a numpy array and doing the same thing.

然后我尝试将列表转换为numpy数组并执行相同的操作。

>>> np.array(my_list)[:,]
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Of course, no problem. My understanding is that one of the __xx__() methods have been overridden and implemented in the numpy package.

当然没问题。我的理解是,其中一个__xx __()方法已被覆盖并在numpy包中实现。

Numpy's indexing supports lists too:

Numpy的索引也支持列表:

>>> np.array(my_list)[:,[0, 1]]
array([[1, 2],
       [4, 5],
       [7, 8]])

This has raised a couple of questions:

这提出了几个问题:

  1. Which __xx__ method has numpy overridden/defined to handle fancy indexing?
  2. 哪个__xx__方法有numpy覆盖/定义来处理花哨的索引?
  3. Why don't python lists natively support fancy indexing?
  4. 为什么python列表本身不支持花式索引?

Furthermore, I ran this code to compare splicing performance on python2 vs python3.

此外,我运行此代码来比较python2与python3上的拼接性能。

import timeit

print(timeit.timeit("list_1[:][:]", 
      setup="list_1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]"))

print(timeit.timeit("list_2[:,]", 
      setup="import numpy as np; list_2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])"))

Python2 (running version 1.8.0rc1):

Python2(运行版本1.8.0rc1):

0.352098941803
1.24272298813

Python3 (running version 1.12.0):

Python3(运行版本1.12.0):

0.23113773498334922
0.20699498101021163

This brings me to:

这让我想到:

  1. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?
  2. 为什么numpy的花式索引在python2上如此之慢?是因为我在这个版本中没有本地BLAS支持numpy吗?

Let me know if I can clarify anything. Thanks.

如果我能澄清任何事情,请告诉我。谢谢。


Edit

Config for numpy on python2:

在python2上配置numpy:

>>> np.show_config()
...
blas_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    extra_compile_args = ['-msse3', '-I/BuildRoot/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.Internal.sdk/System/Library/Frameworks/vecLib.framework/Headers']
    define_macros = [('NO_ATLAS_INFO', 3)]
...
lapack_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    extra_compile_args = ['-msse3']
    define_macros = [('NO_ATLAS_INFO', 3)]
...

And the same for python3:

和python3一样:

>>> np.show_config()
....
blas_opt_info:
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
....
lapack_opt_info:
    extra_compile_args = ['-msse3']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]

3 个解决方案

#1


22  

You have three questions:

你有三个问题:

1. Which __xx__ method has numpy overridden/defined to handle fancy indexing?

The indexing operator [] is overridable using __getitem__, __setitem__, and __delitem__. It can be fun to write a simple subclass that offers some introspection:

使用__getitem __,__ setitem __和__delitem__可以覆盖索引运算符[]。编写一个提供一些内省的简单子类会很有趣:

>>> class VerboseList(list):
...     def __getitem__(self, key):
...         print(key)
...         return super().__getitem__(key)
...

Let's make an empty one first:

让我们先做一个空的:

>>> l = VerboseList()

Now fill it with some values. Note that we haven't overridden __setitem__ so nothing interesting happens yet:

现在填写一些值。请注意,我们还没有覆盖__setitem__,所以没有任何有趣的事情发生:

>>> l[:] = range(10)

Now let's get an item. At index 0 will be 0:

现在让我们来一个项目。索引0将为0:

>>> l[0]
0
0

If we try to use a tuple, we get an error, but we get to see the tuple first!

如果我们尝试使用元组,我们会收到错误,但我们先看到元组!

>>> l[0, 4]
(0, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __getitem__
TypeError: list indices must be integers or slices, not tuple

We can also find out how python represents slices internally:

我们还可以了解python如何在内部表示切片:

>>> l[1:3]
slice(1, 3, None)
[1, 2]

There are lots more fun things you can do with this object -- give it a try!

你可以用这个对象做更多有趣的事情 - 试一试!

2. Why don't python lists natively support fancy indexing?

This is hard to answer. One way of thinking about it is historical: because the numpy developers thought of it first.

这很难回答。考虑它的一种方式是历史:因为numpy开发人员首先考虑它。

You youngsters. When I was a kid...

Upon its first public release in 1991, Python had no numpy library, and to make a multi-dimensional list, you had to nest list structures. I assume that the early developers -- in particular, Guido van Rossum (GvR) -- felt that keeping things simple was best, initially. Slice indexing was already pretty powerful.

在1991年首次公开发布时,Python没有numpy库,而且为了创建一个多维列表,你必须嵌套列表结构。我认为早期的开发人员 - 特别是Guido van Rossum(GvR) - 认为保持简单是最好的,最初。切片索引已经相当强大。

However, not too long after, interest grew in using Python as a scientific computing language. Between 1995 and 1997, a number of developers collaborated on a library called numeric, an early predecessor of numpy. Though he wasn't a major contributor to numeric or numpy, GvR coordinated with the numeric developers, extending Python's slice syntax in ways that made multidimensional array indexing easier. Later, an alternative to numeric arose called numarray; and in 2006, numpy was created, incorporating the best features of both.

然而,不久之后,人们对使用Python作为科学计算语言的兴趣越来越大。 1995年至1997年间,许多开发人员合作开发了一个名为numeric的库,这是numpy的早期前身。虽然他不是数字或numpy的主要贡献者,但GvR与数字开发人员协调,扩展了Python的切片语法,使得多维数组索引更容易。后来,一个名为numarray的数字替代品出现了;在2006年,numpy被创建,融合了两者的最佳特征。

These libraries were powerful, but they required heavy c extensions and so on. Working them into the base Python distribution would have made it bulky. And although GvR did enhance slice syntax a bit, adding fancy indexing to ordinary lists would have changed their API dramatically -- and somewhat redundantly. Given that fancy indexing could be had with an outside library already, the benefit wasn't worth the cost.

这些库很强大,但它们需要大量的c扩展等等。将它们加入基础Python发行版会使它变得笨重。虽然GvR确实增强了切片语法,但是为普通列表添加花哨的索引会大大改变它们的API - 并且有点多余。鉴于已经可以与外部图书馆进行花哨的索引,这样做的好处并不值得。

Parts of this narrative are speculative, in all honesty.1 I don't know the developers really! But it's the same decision I would have made. In fact...

这种叙述的部分内容都是推测性的,一切都是诚实的.1我真的不了解开发者!但这是我所做的同样的决定。事实上...

It really should be that way.

Although fancy indexing is very powerful, I'm glad it's not part of vanilla Python even today, because it means that you don't have to think very hard when working with ordinary lists. For many tasks you don't need it, and the cognitive load it imposes is significant.

虽然花哨的索引非常强大,但我很高兴它甚至不是今天的vanilla Python的一部分,因为这意味着你在使用普通列表时不必非常努力。对于许多任务,您不需要它,并且它所施加的认知负荷是重要的。

Keep in mind that I'm talking about the load imposed on readers and maintainers. You may be a whiz-bang genius who can do 5-d tensor products in your head, but other people have to read your code. Keeping fancy indexing in numpy means people don't use it unless they honestly need it, which makes code more readable and maintainable in general.

请记住,我在谈论强加给读者和维护者的负担。你可能是一个能在你头脑中做五维张量产品的神奇天才,但其他人必须阅读你的代码。保持花哨的索引在numpy意味着人们不会使用它,除非他们诚实地需要它,这使得代码更易读和可维护。

3. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

Possibly. It's definitely environment-dependent; I don't see the same difference on my machine.

有可能。这绝对是环境依赖的;我在机器上看不到相同的区别。


1. The parts of the narrative that aren't as speculative are drawn from a brief history told in a special issue of Computing in Science and Engineering (2011 vol. 13).

1.叙述的部分不是推测性的,是从科学与工程计算特刊(2011年第13卷)中的简短历史中得出的。

#2


3  

my_list[:,] is translated by the interpreter into

my_list [:,]由解释器翻译成

my_list.__getitem__((slice(None, None, None),))

It's like calling a function with *args, but it takes care of translating the : notation into a slice object. Without the , it would just pass the slice. With the , it passes a tuple.

这就像使用* args调用函数一样,但它负责将:符号转换为切片对象。没有它,它只会通过切片。随着它,它通过一个元组。

The list __getitem__ does not accept a tuple, as shown by the error. An array __getitem__ does. I believe the ability to pass a tuple and create slice objects was added as convenience for numpy (or its predicessors). The tuple notation has never been added to the list __getitem__. (There is an operator.itemgetter class that allows a form of advanced indexing, but internally it is just a Python code iterator.)

列表__getitem__不接受元组,如错误所示。一个数组__getitem__。我相信传递元组和创建切片对象的能力被添加为numpy(或其预测者)的便利。元组符号从未添加到列表__getitem__中。 (有一个operator.itemgetter类允许一种高级索引形式,但在内部它只是一个Python代码迭代器。)

With an array you can use the tuple notation directly:

使用数组,您可以直接使用元组表示法:

In [490]: np.arange(6).reshape((2,3))[:,[0,1]]
Out[490]: 
array([[0, 1],
       [3, 4]])
In [491]: np.arange(6).reshape((2,3))[(slice(None),[0,1])]
Out[491]: 
array([[0, 1],
       [3, 4]])
In [492]: np.arange(6).reshape((2,3)).__getitem__((slice(None),[0,1]))
Out[492]: 
array([[0, 1],
       [3, 4]])

Look at the numpy/lib/index_tricks.py file for example of fun stuff you can do with __getitem__. You can view the file with

查看numpy / lib / index_tricks.py文件,例如你可以用__getitem__做的有趣的事情。您可以使用查看该文件

np.source(np.lib.index_tricks)

A nested list is a list of lists:

In a nested list, the sublists are independent of the containing list. The container just has pointers to objects elsewhere in memory:

在嵌套列表中,子列表独立于包含列表。容器只有指向内存中其他对象的指针:

In [494]: my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [495]: my_list
Out[495]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [496]: len(my_list)
Out[496]: 3
In [497]: my_list[1]
Out[497]: [4, 5, 6]
In [498]: type(my_list[1])
Out[498]: list
In [499]: my_list[1]='astring'
In [500]: my_list
Out[500]: [[1, 2, 3], 'astring', [7, 8, 9]]

Here I change the 2nd item of my_list; it is no longer a list, but a string.

在这里我改变了my_list的第二项;它不再是列表,而是字符串。

If I apply [:] to a list I just get a shallow copy:

如果我将[:]应用于列表,我只会得到一个浅表副本:

In [501]: xlist = my_list[:]
In [502]: xlist[1] = 43
In [503]: my_list           # didn't change my_list
Out[503]: [[1, 2, 3], 'astring', [7, 8, 9]]
In [504]: xlist
Out[504]: [[1, 2, 3], 43, [7, 8, 9]]

but changing an element of a list in xlist does change the corresponding sublist in my_list:

但是更改xlist中列表的元素确实会更改my_list中的相应子列表:

In [505]: xlist[0][1]=43
In [506]: my_list
Out[506]: [[1, 43, 3], 'astring', [7, 8, 9]]

To me this shows by n-dimensional indexing (as implemented for numpy arrays) doesn't make sense with nested lists. Nested lists are multidimensional only to the extent that their contents allow; there's nothing structural or syntactically multidimensional about them.

对我来说,这通过n维索引(对于numpy数组实现)显示对嵌套列表没有意义。嵌套列表只有在内容允许的范围内才是多维的;关于它们,没有任何结构或语法上的多维度。

the timings

Using two [:] on a list does not make a deep copy or work its way down the nesting. It just repeats the shallow copy step:

在列表中使用两个[:]不会生成深层副本或沿着嵌套方向运行。它只是重复浅拷贝步骤:

In [507]: ylist=my_list[:][:]
In [508]: ylist[0][1]='boo'
In [509]: xlist
Out[509]: [[1, 'boo', 3], 43, [7, 8, 9]]

arr[:,] just makes a view of arr. The difference between view and copy is part of understanding the difference between basic and advanced indexing.

arr [:,]只是看到了arr。视图和副本之间的区别是理解基本索引和高级索引之间差异的一部分。

So alist[:][:] and arr[:,] are different, but basic ways of making some sort of copy of lists and arrays. Neither computes anything, and neither iterates through the elements. So a timing comparison doesn't tell us much.

因此alist [:] [:]和arr [:,]是不同的,但是制作某种列表和数组副本的基本方法。既不计算任何东西,也不迭代元素。所以时间比较并没有告诉我们多少。

#3


3  

Which __xx__ method has numpy overridden/defined to handle fancy indexing?

哪个__xx__方法有numpy覆盖/定义来处理花哨的索引?

__getitem__ for retrieval, __setitem__ for assignment. It'd be __delitem__ for deletion, except that NumPy arrays don't support deletion.

__getitem__用于检索,__ setitem__用于分配。除了NumPy数组不支持删除外,删除它是__delitem__。

(It's all written in C, though, so what they implemented at C level was mp_subscript and mp_ass_subscript, and __getitem__ and __setitem__ wrappers were provided by PyType_Ready. __delitem__ too, even though deletion is unsupported, because __setitem__ and __delitem__ both map to mp_ass_subscript at C level.)

(尽管如此,它都是用C语言编写的,所以他们在C级实现的是mp_subscript和mp_ass_subscript,并且__getitem__和__setitem__包装器也由PyType_Ready。__ delitem__提供,即使删除不受支持,因为__setitem__和__delitem__都映射到C的mp_ass_subscript水平。)

Why don't python lists natively support fancy indexing?

为什么python列表本身不支持花式索引?

Python lists are fundamentally 1-dimensional structures, while NumPy arrays are arbitrary-dimensional. Multidimensional indexing only makes sense for multidimensional data structures.

Python列表基本上是一维结构,而NumPy数组是任意维度的。多维索引仅对多维数据结构有意义。

You can have a list with lists as elements, like [[1, 2], [3, 4]], but the list doesn't know or care about the structure of its elements. Making lists support l[:, 2] indexing would require the list to be aware of multidimensional structure in a way that lists aren't designed to be. It would also add a lot of complexity, a lot of error handling, and a lot of extra design decisions - how deep a copy should l[:, :] be? What happens if the structure is ragged, or inconsistently nested? Should multidimensional indexing recurse into non-list elements? What would del l[1:3, 1:3] do?

您可以将列表作为元素列出,如[[1,2],[3,4]],但列表不知道或不关心其元素的结构。制作列表支持l [:,2]索引将要求列表以列表未设计的方式了解多维结构。它还会增加很多复杂性,大量的错误处理以及许多额外的设计决策 - 副本应该有多深[?,:]?如果结构粗糙或嵌套不一致会发生什么?多维索引应该递归到非列表元素吗? del l [1:3,1:3]会怎么做?

I've seen the NumPy indexing implementation, and it's longer than the entire implementation of lists. Here's part of it. It's not worth doing that to lists when NumPy arrays satisfy all the really compelling use cases you'd need it for.

我已经看过NumPy索引实现,它比列表的整个实现更长。这是它的一部分。当NumPy数组满足您需要的所有真正引人注目的用例时,不值得这样做。

Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

为什么numpy的花式索引在python2上如此之慢?是因为我在这个版本中没有本地BLAS支持numpy吗?

NumPy indexing isn't a BLAS operation, so that's not it. I can't reproduce such dramatic timing differences, and the differences I do see look like minor Python 3 optimizations, maybe slightly more efficient allocation of tuples or slices. What you're seeing is probably due to NumPy version differences.

NumPy索引不是BLAS操作,所以不是这样。我无法重现如此戏剧性的时序差异,我看到的差异看起来像是次要的Python 3优化,可能稍微更有效地分配元组或切片。您所看到的可能是由于NumPy版本的差异。

#1


22  

You have three questions:

你有三个问题:

1. Which __xx__ method has numpy overridden/defined to handle fancy indexing?

The indexing operator [] is overridable using __getitem__, __setitem__, and __delitem__. It can be fun to write a simple subclass that offers some introspection:

使用__getitem __,__ setitem __和__delitem__可以覆盖索引运算符[]。编写一个提供一些内省的简单子类会很有趣:

>>> class VerboseList(list):
...     def __getitem__(self, key):
...         print(key)
...         return super().__getitem__(key)
...

Let's make an empty one first:

让我们先做一个空的:

>>> l = VerboseList()

Now fill it with some values. Note that we haven't overridden __setitem__ so nothing interesting happens yet:

现在填写一些值。请注意,我们还没有覆盖__setitem__,所以没有任何有趣的事情发生:

>>> l[:] = range(10)

Now let's get an item. At index 0 will be 0:

现在让我们来一个项目。索引0将为0:

>>> l[0]
0
0

If we try to use a tuple, we get an error, but we get to see the tuple first!

如果我们尝试使用元组,我们会收到错误,但我们先看到元组!

>>> l[0, 4]
(0, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __getitem__
TypeError: list indices must be integers or slices, not tuple

We can also find out how python represents slices internally:

我们还可以了解python如何在内部表示切片:

>>> l[1:3]
slice(1, 3, None)
[1, 2]

There are lots more fun things you can do with this object -- give it a try!

你可以用这个对象做更多有趣的事情 - 试一试!

2. Why don't python lists natively support fancy indexing?

This is hard to answer. One way of thinking about it is historical: because the numpy developers thought of it first.

这很难回答。考虑它的一种方式是历史:因为numpy开发人员首先考虑它。

You youngsters. When I was a kid...

Upon its first public release in 1991, Python had no numpy library, and to make a multi-dimensional list, you had to nest list structures. I assume that the early developers -- in particular, Guido van Rossum (GvR) -- felt that keeping things simple was best, initially. Slice indexing was already pretty powerful.

在1991年首次公开发布时,Python没有numpy库,而且为了创建一个多维列表,你必须嵌套列表结构。我认为早期的开发人员 - 特别是Guido van Rossum(GvR) - 认为保持简单是最好的,最初。切片索引已经相当强大。

However, not too long after, interest grew in using Python as a scientific computing language. Between 1995 and 1997, a number of developers collaborated on a library called numeric, an early predecessor of numpy. Though he wasn't a major contributor to numeric or numpy, GvR coordinated with the numeric developers, extending Python's slice syntax in ways that made multidimensional array indexing easier. Later, an alternative to numeric arose called numarray; and in 2006, numpy was created, incorporating the best features of both.

然而,不久之后,人们对使用Python作为科学计算语言的兴趣越来越大。 1995年至1997年间,许多开发人员合作开发了一个名为numeric的库,这是numpy的早期前身。虽然他不是数字或numpy的主要贡献者,但GvR与数字开发人员协调,扩展了Python的切片语法,使得多维数组索引更容易。后来,一个名为numarray的数字替代品出现了;在2006年,numpy被创建,融合了两者的最佳特征。

These libraries were powerful, but they required heavy c extensions and so on. Working them into the base Python distribution would have made it bulky. And although GvR did enhance slice syntax a bit, adding fancy indexing to ordinary lists would have changed their API dramatically -- and somewhat redundantly. Given that fancy indexing could be had with an outside library already, the benefit wasn't worth the cost.

这些库很强大,但它们需要大量的c扩展等等。将它们加入基础Python发行版会使它变得笨重。虽然GvR确实增强了切片语法,但是为普通列表添加花哨的索引会大大改变它们的API - 并且有点多余。鉴于已经可以与外部图书馆进行花哨的索引,这样做的好处并不值得。

Parts of this narrative are speculative, in all honesty.1 I don't know the developers really! But it's the same decision I would have made. In fact...

这种叙述的部分内容都是推测性的,一切都是诚实的.1我真的不了解开发者!但这是我所做的同样的决定。事实上...

It really should be that way.

Although fancy indexing is very powerful, I'm glad it's not part of vanilla Python even today, because it means that you don't have to think very hard when working with ordinary lists. For many tasks you don't need it, and the cognitive load it imposes is significant.

虽然花哨的索引非常强大,但我很高兴它甚至不是今天的vanilla Python的一部分,因为这意味着你在使用普通列表时不必非常努力。对于许多任务,您不需要它,并且它所施加的认知负荷是重要的。

Keep in mind that I'm talking about the load imposed on readers and maintainers. You may be a whiz-bang genius who can do 5-d tensor products in your head, but other people have to read your code. Keeping fancy indexing in numpy means people don't use it unless they honestly need it, which makes code more readable and maintainable in general.

请记住,我在谈论强加给读者和维护者的负担。你可能是一个能在你头脑中做五维张量产品的神奇天才,但其他人必须阅读你的代码。保持花哨的索引在numpy意味着人们不会使用它,除非他们诚实地需要它,这使得代码更易读和可维护。

3. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

Possibly. It's definitely environment-dependent; I don't see the same difference on my machine.

有可能。这绝对是环境依赖的;我在机器上看不到相同的区别。


1. The parts of the narrative that aren't as speculative are drawn from a brief history told in a special issue of Computing in Science and Engineering (2011 vol. 13).

1.叙述的部分不是推测性的,是从科学与工程计算特刊(2011年第13卷)中的简短历史中得出的。

#2


3  

my_list[:,] is translated by the interpreter into

my_list [:,]由解释器翻译成

my_list.__getitem__((slice(None, None, None),))

It's like calling a function with *args, but it takes care of translating the : notation into a slice object. Without the , it would just pass the slice. With the , it passes a tuple.

这就像使用* args调用函数一样,但它负责将:符号转换为切片对象。没有它,它只会通过切片。随着它,它通过一个元组。

The list __getitem__ does not accept a tuple, as shown by the error. An array __getitem__ does. I believe the ability to pass a tuple and create slice objects was added as convenience for numpy (or its predicessors). The tuple notation has never been added to the list __getitem__. (There is an operator.itemgetter class that allows a form of advanced indexing, but internally it is just a Python code iterator.)

列表__getitem__不接受元组,如错误所示。一个数组__getitem__。我相信传递元组和创建切片对象的能力被添加为numpy(或其预测者)的便利。元组符号从未添加到列表__getitem__中。 (有一个operator.itemgetter类允许一种高级索引形式,但在内部它只是一个Python代码迭代器。)

With an array you can use the tuple notation directly:

使用数组,您可以直接使用元组表示法:

In [490]: np.arange(6).reshape((2,3))[:,[0,1]]
Out[490]: 
array([[0, 1],
       [3, 4]])
In [491]: np.arange(6).reshape((2,3))[(slice(None),[0,1])]
Out[491]: 
array([[0, 1],
       [3, 4]])
In [492]: np.arange(6).reshape((2,3)).__getitem__((slice(None),[0,1]))
Out[492]: 
array([[0, 1],
       [3, 4]])

Look at the numpy/lib/index_tricks.py file for example of fun stuff you can do with __getitem__. You can view the file with

查看numpy / lib / index_tricks.py文件,例如你可以用__getitem__做的有趣的事情。您可以使用查看该文件

np.source(np.lib.index_tricks)

A nested list is a list of lists:

In a nested list, the sublists are independent of the containing list. The container just has pointers to objects elsewhere in memory:

在嵌套列表中,子列表独立于包含列表。容器只有指向内存中其他对象的指针:

In [494]: my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [495]: my_list
Out[495]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [496]: len(my_list)
Out[496]: 3
In [497]: my_list[1]
Out[497]: [4, 5, 6]
In [498]: type(my_list[1])
Out[498]: list
In [499]: my_list[1]='astring'
In [500]: my_list
Out[500]: [[1, 2, 3], 'astring', [7, 8, 9]]

Here I change the 2nd item of my_list; it is no longer a list, but a string.

在这里我改变了my_list的第二项;它不再是列表,而是字符串。

If I apply [:] to a list I just get a shallow copy:

如果我将[:]应用于列表,我只会得到一个浅表副本:

In [501]: xlist = my_list[:]
In [502]: xlist[1] = 43
In [503]: my_list           # didn't change my_list
Out[503]: [[1, 2, 3], 'astring', [7, 8, 9]]
In [504]: xlist
Out[504]: [[1, 2, 3], 43, [7, 8, 9]]

but changing an element of a list in xlist does change the corresponding sublist in my_list:

但是更改xlist中列表的元素确实会更改my_list中的相应子列表:

In [505]: xlist[0][1]=43
In [506]: my_list
Out[506]: [[1, 43, 3], 'astring', [7, 8, 9]]

To me this shows by n-dimensional indexing (as implemented for numpy arrays) doesn't make sense with nested lists. Nested lists are multidimensional only to the extent that their contents allow; there's nothing structural or syntactically multidimensional about them.

对我来说,这通过n维索引(对于numpy数组实现)显示对嵌套列表没有意义。嵌套列表只有在内容允许的范围内才是多维的;关于它们,没有任何结构或语法上的多维度。

the timings

Using two [:] on a list does not make a deep copy or work its way down the nesting. It just repeats the shallow copy step:

在列表中使用两个[:]不会生成深层副本或沿着嵌套方向运行。它只是重复浅拷贝步骤:

In [507]: ylist=my_list[:][:]
In [508]: ylist[0][1]='boo'
In [509]: xlist
Out[509]: [[1, 'boo', 3], 43, [7, 8, 9]]

arr[:,] just makes a view of arr. The difference between view and copy is part of understanding the difference between basic and advanced indexing.

arr [:,]只是看到了arr。视图和副本之间的区别是理解基本索引和高级索引之间差异的一部分。

So alist[:][:] and arr[:,] are different, but basic ways of making some sort of copy of lists and arrays. Neither computes anything, and neither iterates through the elements. So a timing comparison doesn't tell us much.

因此alist [:] [:]和arr [:,]是不同的,但是制作某种列表和数组副本的基本方法。既不计算任何东西,也不迭代元素。所以时间比较并没有告诉我们多少。

#3


3  

Which __xx__ method has numpy overridden/defined to handle fancy indexing?

哪个__xx__方法有numpy覆盖/定义来处理花哨的索引?

__getitem__ for retrieval, __setitem__ for assignment. It'd be __delitem__ for deletion, except that NumPy arrays don't support deletion.

__getitem__用于检索,__ setitem__用于分配。除了NumPy数组不支持删除外,删除它是__delitem__。

(It's all written in C, though, so what they implemented at C level was mp_subscript and mp_ass_subscript, and __getitem__ and __setitem__ wrappers were provided by PyType_Ready. __delitem__ too, even though deletion is unsupported, because __setitem__ and __delitem__ both map to mp_ass_subscript at C level.)

(尽管如此,它都是用C语言编写的,所以他们在C级实现的是mp_subscript和mp_ass_subscript,并且__getitem__和__setitem__包装器也由PyType_Ready。__ delitem__提供,即使删除不受支持,因为__setitem__和__delitem__都映射到C的mp_ass_subscript水平。)

Why don't python lists natively support fancy indexing?

为什么python列表本身不支持花式索引?

Python lists are fundamentally 1-dimensional structures, while NumPy arrays are arbitrary-dimensional. Multidimensional indexing only makes sense for multidimensional data structures.

Python列表基本上是一维结构,而NumPy数组是任意维度的。多维索引仅对多维数据结构有意义。

You can have a list with lists as elements, like [[1, 2], [3, 4]], but the list doesn't know or care about the structure of its elements. Making lists support l[:, 2] indexing would require the list to be aware of multidimensional structure in a way that lists aren't designed to be. It would also add a lot of complexity, a lot of error handling, and a lot of extra design decisions - how deep a copy should l[:, :] be? What happens if the structure is ragged, or inconsistently nested? Should multidimensional indexing recurse into non-list elements? What would del l[1:3, 1:3] do?

您可以将列表作为元素列出,如[[1,2],[3,4]],但列表不知道或不关心其元素的结构。制作列表支持l [:,2]索引将要求列表以列表未设计的方式了解多维结构。它还会增加很多复杂性,大量的错误处理以及许多额外的设计决策 - 副本应该有多深[?,:]?如果结构粗糙或嵌套不一致会发生什么?多维索引应该递归到非列表元素吗? del l [1:3,1:3]会怎么做?

I've seen the NumPy indexing implementation, and it's longer than the entire implementation of lists. Here's part of it. It's not worth doing that to lists when NumPy arrays satisfy all the really compelling use cases you'd need it for.

我已经看过NumPy索引实现,它比列表的整个实现更长。这是它的一部分。当NumPy数组满足您需要的所有真正引人注目的用例时,不值得这样做。

Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

为什么numpy的花式索引在python2上如此之慢?是因为我在这个版本中没有本地BLAS支持numpy吗?

NumPy indexing isn't a BLAS operation, so that's not it. I can't reproduce such dramatic timing differences, and the differences I do see look like minor Python 3 optimizations, maybe slightly more efficient allocation of tuples or slices. What you're seeing is probably due to NumPy version differences.

NumPy索引不是BLAS操作,所以不是这样。我无法重现如此戏剧性的时序差异,我看到的差异看起来像是次要的Python 3优化,可能稍微更有效地分配元组或切片。您所看到的可能是由于NumPy版本的差异。