I have an arbitrarily deeply nested list, with varying length of elements
我有一个任意深度嵌套的列表,具有不同长度的元素
my_list = [[[1,2],[4]],[[4,4,3]],[[1,2,1],[4,3,4,5],[4,1]]]
I want to convert this to a valid numeric (not object) numpy array, by padding out each axis with NaN. So the result should look like
我想通过用NaN填充每个轴来将其转换为有效的数字(非对象)numpy数组。所以结果应该是这样的
padded_list = np.array([[[ 1, 2, nan, nan],
[ 4, nan, nan, nan],
[nan, nan, nan, nan]],
[[ 4, 4, 3, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan]],
[[ 1, 2, 1, nan],
[ 4, 3, 4, 5],
[ 4, 1, nan, nan]]])
How do I do this?
我该怎么做呢?
2 个解决方案
#1
6
This works on your sample, not sure it can handle all the corner cases properly:
这适用于您的样本,不确定它是否可以正确处理所有角落情况:
from itertools import izip_longest
def find_shape(seq):
try:
len_ = len(seq)
except TypeError:
return ()
shapes = [find_shape(subseq) for subseq in seq]
return (len_,) + tuple(max(sizes) for sizes in izip_longest(*shapes,
fillvalue=1))
def fill_array(arr, seq):
if arr.ndim == 1:
try:
len_ = len(seq)
except TypeError:
len_ = 0
arr[:len_] = seq
arr[len_:] = np.nan
else:
for subarr, subseq in izip_longest(arr, seq, fillvalue=()):
fill_array(subarr, subseq)
And now:
现在:
>>> arr = np.empty(find_shape(my_list))
>>> fill_array(arr, my_list)
>>> arr
array([[[ 1., 2., nan, nan],
[ 4., nan, nan, nan],
[ nan, nan, nan, nan]],
[[ 4., 4., 3., nan],
[ nan, nan, nan, nan],
[ nan, nan, nan, nan]],
[[ 1., 2., 1., nan],
[ 4., 3., 4., 5.],
[ 4., 1., nan, nan]]])
I think this is roughly what the shape discovery routines of numpy do. Since there are lots of Python function calls involved anyway, it probably won't compare that badly against the C implementation.
我认为这大致是numpy的形状发现例程。由于无论如何都涉及到许多Python函数调用,它可能无法与C实现进行比较。
#2
1
First of all, count the lengths of a column and row:
首先,计算列和行的长度:
len1 = max((len(el) for el in my_list))
len2 = max(len(el) for el in list(chain(*my_list)))
Second, append missing nans:
第二,追加缺失的nans:
for el1 in my_list:
el1.extend([[]]*(len1-len(el1)))
for el2 in el1:
el2.extend([numpy.nan] * (len2-len(el2)))
#1
6
This works on your sample, not sure it can handle all the corner cases properly:
这适用于您的样本,不确定它是否可以正确处理所有角落情况:
from itertools import izip_longest
def find_shape(seq):
try:
len_ = len(seq)
except TypeError:
return ()
shapes = [find_shape(subseq) for subseq in seq]
return (len_,) + tuple(max(sizes) for sizes in izip_longest(*shapes,
fillvalue=1))
def fill_array(arr, seq):
if arr.ndim == 1:
try:
len_ = len(seq)
except TypeError:
len_ = 0
arr[:len_] = seq
arr[len_:] = np.nan
else:
for subarr, subseq in izip_longest(arr, seq, fillvalue=()):
fill_array(subarr, subseq)
And now:
现在:
>>> arr = np.empty(find_shape(my_list))
>>> fill_array(arr, my_list)
>>> arr
array([[[ 1., 2., nan, nan],
[ 4., nan, nan, nan],
[ nan, nan, nan, nan]],
[[ 4., 4., 3., nan],
[ nan, nan, nan, nan],
[ nan, nan, nan, nan]],
[[ 1., 2., 1., nan],
[ 4., 3., 4., 5.],
[ 4., 1., nan, nan]]])
I think this is roughly what the shape discovery routines of numpy do. Since there are lots of Python function calls involved anyway, it probably won't compare that badly against the C implementation.
我认为这大致是numpy的形状发现例程。由于无论如何都涉及到许多Python函数调用,它可能无法与C实现进行比较。
#2
1
First of all, count the lengths of a column and row:
首先,计算列和行的长度:
len1 = max((len(el) for el in my_list))
len2 = max(len(el) for el in list(chain(*my_list)))
Second, append missing nans:
第二,追加缺失的nans:
for el1 in my_list:
el1.extend([[]]*(len1-len(el1)))
for el2 in el1:
el2.extend([numpy.nan] * (len2-len(el2)))