在Keras中批量大小的batch_dot。

I'm trying to writting a layer to merge 2 tensors with such a formula

我试着写一个层来合并两个张量和这个公式

The shapes of x[0] and x[1] are both (?, 1, 500).

x[0]和x[1]的形状都是(?),500)。

M is a 500*500 Matrix.

M是一个500*500矩阵。

I want the output to be (?, 500, 500) which is theoretically feasible in my opinion. The layer will output (1,500,500) for every pair of inputs, as (1, 1, 500) and (1, 1, 500). As the batch_size is variable, or dynamic, the output must be (?, 500, 500).

我希望输出是(?在我看来，这在理论上是可行的。该层将为每一对输入输出(1,1,500)和(1,1,500)。由于batch_size是变量或动态的，所以输出必须是(?、500、500)。

However, I know little about axes and I have tried all the combinations of axes but it doesn't make sense.

但是，我对坐标轴知之甚少，我尝试过所有的坐标轴的组合，但是没有意义。

I try with numpy.tensordot and keras.backend.batch_dot(TensorFlow). If the batch_size is fixed, taking a = (100,1,500) for example, batch_dot(a,M,(2,0)), the output can be (100,1,500).

我和numpy试试。tensordot和keras.backend.batch_dot(TensorFlow)。如果batch_size是固定的，以a =(100,1500)为例，batch_dot(a,M，(2,0))为例，输出可以是(100,1500)。

Newbie for Keras, sorry for such a stupid question but I have spent 2 days to figure out and it drove me crazy :(

作为Keras的新手，很抱歉问了这么一个愚蠢的问题，但我花了两天时间才弄明白，这让我发疯

    def call(self,x):
            input1 = x[0]
            input2 = x[1]
            #self.M is defined in build function
            output = K.batch_dot(...)
            return output

Update:

更新:

Sorry for being late. I try Daniel's answer with TensorFlow as Keras's backend and it still raises a ValueError for unequal dimensions.

抱歉迟到了。我用TensorFlow作为Keras的后端来尝试Daniel的答案，它仍然会为不相等的维度带来一个ValueError。

I try the same code with Theano as backend and now it works.

我尝试用Theano作为后端，现在它可以工作了。

>>> import numpy as np
>>> import keras.backend as K
Using Theano backend.
>>> from keras.layers import Input
>>> x1 = Input(shape=[1,500,])
>>> M = K.variable(np.ones([1,500,500]))
>>> firstMul = K.batch_dot(x1, M, axes=[1,2])

I don't know how to print tensors' shape in theano. It's definitely harder than tensorflow for me... However it works.

我不知道如何打印张量的形状。对我来说，这绝对比紧张要难……然而它的工作原理。

For that I scan 2 versions of codes for Tensorflow and Theano. Following are differences.

为此，我扫描了Tensorflow和Theano两个版本的代码。以下是差异。

In this case, x = (?, 1, 500), y = (1, 500, 500), axes = [1, 2]

在这种情况下，x = (?， y =(1,500,500)，坐标轴= [1,2]

In tensorflow_backend:

在tensorflow_backend:

return tf.matmul(x, y, adjoint_a=True, adjoint_b=True)

In theano_backend:

在theano_backend:

return T.batched_tensordot(x, y, axes=axes)

(If following changes of out._keras_shape don't make influence on out's value.)

(如果以下更改为out。_keras_shape不影响out的值。

2 个解决方案

#1

Your multiplications should select which axes it uses in the batch dot function.

您的乘法应该选择它在批点函数中使用的轴。

Axis 0 - the batch dimension, it's your ?
轴0 -批次尺寸，它是你的?
Axis 1 - the dimension you say has length 1
轴1 -你说的尺寸长度是1
Axis 2 - the last dimension, of size 500
轴2 -最后一个维度，大小为500

You won't change the batch dimension, so you will use batch_dot always with axes=[1,2]

您不会更改批处理维度，因此您将始终使用batch_dot，并使用坐标轴=[1,2]

But for that to work, you must ajust M to be (?, 500, 500).
For that define M not as (500,500), but as (1,500,500) instead, and repeat it in the first axis for the batch size:

但是要使它起作用，你必须让我成为(?、500、500)。定义M不是为(500,500)，而是为(1,500,500)，并在第一个轴中重复批次大小:

import keras.backend as K

#Being M with shape (1,500,500), we repeat it.   
BatchM = K.repeat_elements(x=M,rep=batch_size,axis=0)
#Not sure if repeating is really necessary, leaving M as (1,500,500) gives the same output shape at the end, but I haven't checked actual numbers for correctness, I believe it's totally ok. 

#Now we can use batch dot properly:
firstMul = K.batch_dot(x[0], BatchM, axes=[1,2]) #will result in (?,500,500)

#we also need to transpose x[1]:
x1T = K.permute_dimensions(x[1],(0,2,1))

#and the second multiplication:
result = K.batch_dot(firstMul, x1T, axes=[1,2])

#2

I prefer using TensorFlow so I tried to figure it out with TensorFlow in past few days.

我更喜欢使用TensorFlow，所以在过去的几天里我尝试用TensorFlow来解决这个问题。

The first one is much similar to Daniel's solution.

第一个和丹尼尔的解决方案很相似。

x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(None,3,3))
tf.matmul(x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>

It needs to feed values to M with fit shapes.

它需要向M提供符合形状的值。

sess = tf.Session()
sess.run(tf.matmul(x,M), feed_dict = {x: [[[1,2,3]]], M: [[[1,2,3],[0,1,0],[0,0,1]]]})
# return : array([[[ 1.,  4.,  6.]]], dtype=float32)

Another way is simple with tf.einsum.

另一种方法是使用tf.einsum。

x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(3,3))
tf.einsum('ijk,lm->ikl', x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>

Let's feed some values.

让我们喂一些值。

sess.run(tf.einsum('ijk,kl->ijl', x, M), feed_dict = {x: [[[1,2,3]]], M: [[1,2,3],[0,1,0],[0,0,1]]})
# return: array([[[ 1.,  4.,  6.]]], dtype=float32)

Now M is a 2D tensor and no need to feed batch_size to M.

现在M是一个二维张量，不需要向M输入batch_size。

What's more, now it seems such a question can be solved in TensorFlow with tf.einsum. Does it mean it's a duty for Keras to invoke tf.einsum in some situations? At least I find no where Keras calls tf.einsum. And in my opinion, when batch_dot 3D tensor and 2D tensor Keras behaves weirdly. In Daniel's answer, he pads M to (1,500,500) but in K.batch_dot() M will be adjusted to (500,500,1) automatically. I find tf will adjust it with Broadcasting rules and I'm not sure Keras does the same.

更重要的是，现在似乎这样的问题可以通过tf.einsum的TensorFlow来解决。这是否意味着Keras有责任调用tf ?在某些情况下einsum吗?至少我找不到Keras打电话给tf.einsum的地方。在我看来，当batch_dot 3D张量和2D张量Keras怪异地运行时。在丹尼尔的回答中，他把M(1,500500)写在了(1,500500)上，但在kbatch_dot()中，M将被自动调整为(500,500,1)。我发现tf会根据广播规则进行调整，我不确定Keras也会这样做。

#1