为什么正则表达式匹配对象即使实现__getitem__也不可迭代?

时间:2022-04-04 22:13:24

As you may know, implementing a __getitem__ method makes a class iterable:

您可能知道,实现__getitem__方法会使类可迭代:

class IterableDemo:
    def __getitem__(self, index):
        if index > 3:
            raise IndexError

        return index

demo = IterableDemo()
print(demo[2])  # 2
print(list(demo))  # [0, 1, 2, 3]

However, this doesn't hold true for regex match objects:

但是,对于正则表达式匹配对象,这不适用:

>>> import re
>>> match = re.match('(ab)c', 'abc')
>>> match[0]
'abc'
>>> match[1]
'ab'
>>> list(match)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '_sre.SRE_Match' object is not iterable

It's worth noting that this exception isn't thrown in the __iter__ method, because that method isn't even implemented:

值得注意的是,__iter__方法中没有抛出此异常,因为该方法甚至没有实现:

>>> hasattr(match, '__iter__')
False

So, how is it possible to implement __getitem__ without making the class iterable?

那么,如何在不使类可迭代的情况下实现__getitem__呢?

1 个解决方案

#1


6  

There are lies, damned lies and then there is Python documentation.

有谎言,该死的谎言,然后有Python文档。

It is not enough to have a __getitem__ for a class implemented in C to be iterable. That is because there are actually 2 places in the PyTypeObject where the __getitem__ can be mapped to: tp_as_sequence and tp_as_mapping. Both have a slot for __getitem__ ([1], [2]).

对于在C中实现的可迭代的类,__getitem__是不够的。这是因为PyTypeObject中实际上有2个位置可以将__getitem__映射到:tp_as_sequence和tp_as_mapping。两者都有__getitem__的插槽([1],[2])。

Looking at the source of the SRE_Match, tp_as_sequence is initialized to NULL whereas tp_as_mapping is defined.

查看SRE_Match的源代码,将tp_as_sequence初始化为NULL,同时定义tp_as_mapping。

The iter() built-in function, if called with one argument, will call the PyObject_GetIter, which has the following code:

iter()内置函数,如果使用一个参数调用,将调用PyObject_GetIter,它具有以下代码:

f = t->tp_iter;
if (f == NULL) {
    if (PySequence_Check(o))
        return PySeqIter_New(o);
    return type_error("'%.200s' object is not iterable", o);
}

It first checks the tp_iter slot (obviously NULL for _SRE_Match objects); and failing that, then if PySequence_Check returns true, a new sequence iterator, else a TypeError is raised.

它首先检查tp_iter槽(显然是_SRE_Match对象的NULL);如果失败,那么如果PySequence_Check返回true,则返回一个新的序列迭代器,否则会引发TypeError。

PySequenceCheck first checks if the object is a dict or a dict subclass - and returns false in that case. Otherwise it returns the value of

PySequenceCheck首先检查对象是dict还是dict子类 - 在这种情况下返回false。否则返回值

s->ob_type->tp_as_sequence &&
    s->ob_type->tp_as_sequence->sq_item != NULL;

and since s->ob_type->tp_as_sequence was NULL for a _SRE_Match instance, 0 will be returned, and PyObject_GetIter raises TypeError: '_sre.SRE_Match' object is not iterable.

由于_SRE_Match实例的s-> ob_type-> tp_as_sequence为NULL,因此返回0,PyObject_GetIter引发TypeError:'_sre.SRE_Match'对象不可迭代。

#1


6  

There are lies, damned lies and then there is Python documentation.

有谎言,该死的谎言,然后有Python文档。

It is not enough to have a __getitem__ for a class implemented in C to be iterable. That is because there are actually 2 places in the PyTypeObject where the __getitem__ can be mapped to: tp_as_sequence and tp_as_mapping. Both have a slot for __getitem__ ([1], [2]).

对于在C中实现的可迭代的类,__getitem__是不够的。这是因为PyTypeObject中实际上有2个位置可以将__getitem__映射到:tp_as_sequence和tp_as_mapping。两者都有__getitem__的插槽([1],[2])。

Looking at the source of the SRE_Match, tp_as_sequence is initialized to NULL whereas tp_as_mapping is defined.

查看SRE_Match的源代码,将tp_as_sequence初始化为NULL,同时定义tp_as_mapping。

The iter() built-in function, if called with one argument, will call the PyObject_GetIter, which has the following code:

iter()内置函数,如果使用一个参数调用,将调用PyObject_GetIter,它具有以下代码:

f = t->tp_iter;
if (f == NULL) {
    if (PySequence_Check(o))
        return PySeqIter_New(o);
    return type_error("'%.200s' object is not iterable", o);
}

It first checks the tp_iter slot (obviously NULL for _SRE_Match objects); and failing that, then if PySequence_Check returns true, a new sequence iterator, else a TypeError is raised.

它首先检查tp_iter槽(显然是_SRE_Match对象的NULL);如果失败,那么如果PySequence_Check返回true,则返回一个新的序列迭代器,否则会引发TypeError。

PySequenceCheck first checks if the object is a dict or a dict subclass - and returns false in that case. Otherwise it returns the value of

PySequenceCheck首先检查对象是dict还是dict子类 - 在这种情况下返回false。否则返回值

s->ob_type->tp_as_sequence &&
    s->ob_type->tp_as_sequence->sq_item != NULL;

and since s->ob_type->tp_as_sequence was NULL for a _SRE_Match instance, 0 will be returned, and PyObject_GetIter raises TypeError: '_sre.SRE_Match' object is not iterable.

由于_SRE_Match实例的s-> ob_type-> tp_as_sequence为NULL,因此返回0,PyObject_GetIter引发TypeError:'_sre.SRE_Match'对象不可迭代。