重新排列和叠加一个2D数组，形成一个3D数组

I have a dataframe as below

我有如下的数据aframe

df = pd.DataFrame({'a':[1,1,1,2,2,2], 
                   'b': [10, 20, 30, 20, 40, 60],
                  'c': [80, 80, 80, 120, 120, 120]})

I want to get 3D array

我想要得到3D数组

array([[[  1,  10,  80],
       [  2,  20, 120] ],

       [[  1,  20,  80] ,
       [  2,  40, 120] ],

       [[  1,  30,  80],
        [  2,  60, 120]]], dtype=int64)

I do like this

我非常喜欢这

values = df.values
values.reshape(3, 2, 3)

and get an incorrect array. How to get the expected array?

得到一个错误的数组。如何获得预期的数组?

2 个解决方案

#1

Get the array data, then reshape splitting the first axis into two with the first of them being of length 2 giving us a 3D array and then swap those two axes -

获取数组数据，然后将第一个轴分解为两个，第一个轴长度为2，得到一个3D数组，然后交换这两个轴-

df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)

Sample run -

样本运行-

In [711]: df
Out[711]: 
   a   b    c
0  1  10   80
1  1  20   80
2  1  30   80
3  2  20  120
4  2  40  120
5  2  60  120

In [713]: df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
Out[713]: 
array([[[  1,  10,  80],
        [  2,  20, 120]],

       [[  1,  20,  80],
        [  2,  40, 120]],

       [[  1,  30,  80],
        [  2,  60, 120]]])

This gives us a view into the original data without making a copy and as such has a minimal constant time.

这使我们不用复制就能看到原始数据，因此具有最小的常数时间。

Runtime test

运行时测试

Case #1 :

例# 1:

In [730]: df = pd.DataFrame(np.random.randint(0,9,(2000,100)))

# @cᴏʟᴅsᴘᴇᴇᴅ's soln
In [731]: %timeit np.stack(np.split(df.values, 2), axis=1)
10000 loops, best of 3: 109 µs per loop

In [732]: %timeit df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
100000 loops, best of 3: 8.55 µs per loop

Case #2 :

例# 2:

In [733]: df = pd.DataFrame(np.random.randint(0,9,(2000,2000)))

# @cᴏʟᴅsᴘᴇᴇᴅ's soln
In [734]: %timeit np.stack(np.split(df.values, 2), axis=1)
100 loops, best of 3: 4.3 ms per loop

In [735]: %timeit df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
100000 loops, best of 3: 8.37 µs per loop

#2

Try np.split + np.stack:

尝试np。分+ np.stack:

np.stack(np.split(df.values, 2), axis=1)

array([[[  1,  10,  80],
        [  2,  20, 120]],

       [[  1,  20,  80],
        [  2,  40, 120]],

       [[  1,  30,  80],
        [  2,  60, 120]]])

#1