Say I have a matrix with a dimension of A*B
on GPU, where B
(number of columns) is the leading dimension assuming a C style. Is there any method in CUDA (or cublas) to transpose this matrix to FORTRAN style, where A
(number of rows) becomes the leading dimension?
假设我有一个矩阵,在GPU上有一个*B的维度,其中B(列数)是假设C样式的主要维度。在CUDA(或cublas)中是否有方法将这个矩阵转置到FORTRAN风格中,其中A(行数)成为主导维度?
It is even better if it could be transposed during host->device
transfer while keep the original data unchanged.
如果可以在主机->设备传输期间进行转置,同时保持原始数据不变,则更好。
3 个解决方案
#1
4
The CUDA SDK includes a matrix transpose, you can see here examples of code on how to implement one, ranging from a naive implementation to optimized versions.
CUDA SDK包含一个矩阵转置,您可以在这里看到关于如何实现一个的代码示例,从简单的实现到优化的版本。
For example:
例如:
Naïve transpose
天真的转置
__global__ void transposeNaive(float *odata, float* idata,
int width, int height, int nreps)
{
int xIndex = blockIdx.x*TILE_DIM + threadIdx.x;
int yIndex = blockIdx.y*TILE_DIM + threadIdx.y;
int index_in = xIndex + width * yIndex;
int index_out = yIndex + height * xIndex;
for (int r=0; r < nreps; r++)
{
for (int i=0; i<TILE_DIM; i+=BLOCK_ROWS)
{
odata[index_out+i] = idata[index_in+i*width];
}
}
}
Like talonmies had point out you can specify if you want operate the matrix as transposed or not, in cublas matrix operations eg.: for cublasDgemm() where C = a * op(A) * op(B) + b * C, assuming you want to operate A as transposed (A^T), on the parameters you can specify if it is ('N' normal or 'T' transposed)
就像talonmies指出的那样,你可以指定如果你想要把矩阵作为转置,在cublas矩阵运算中。:对于cublasDgemm(), C = a * op(a) * op(B) + B * C,假设你想操作a作为转置(a),在参数上你可以指定它是否为('N' normal '或'T' transposed)
#2
8
as asked within the title, to transpose a device row-major matrix A[m][n], one can do it this way:
正如题目中所要求的,要转置一个设备行-主矩阵a [m][n],可以这样做:
float* clone = ...;//copy content of A to clone
float const alpha(1.0);
float const beta(0.0);
cublasHandle_t handle;
cublasCreate(&handle);
cublasSgeam( handle, CUBLAS_OP_T, CUBLAS_OP_N, m, n, &alpha, clone, n, &beta, clone, m, A, m );
cublasDestroy(handle);
And, to multiply two row-major matrices A[m][k] B[k][n], C=A*B
并且,要将两个行矩阵A[m][k] B[k], C=A*B相乘。
cublasSgemm( handle, CUBLAS_OP_N, CUBLAS_OP_N, n, m, k, &alpha, B, n, A, k, &beta, C, n );
where C is also a row-major matrix.
其中C也是一个行主矩阵。
#3
4
The version of CUBLAS bundled with the CUDA 5 toolkit contains a BLAS-like method (cublasgeam) that could be used to transpose a matrix. It's documented here.
CUBLAS与CUDA 5工具包捆绑在一起的版本中包含了一个类似bla的方法(cublasgeam),可以用来转置一个矩阵。这里的记录。
#1
4
The CUDA SDK includes a matrix transpose, you can see here examples of code on how to implement one, ranging from a naive implementation to optimized versions.
CUDA SDK包含一个矩阵转置,您可以在这里看到关于如何实现一个的代码示例,从简单的实现到优化的版本。
For example:
例如:
Naïve transpose
天真的转置
__global__ void transposeNaive(float *odata, float* idata,
int width, int height, int nreps)
{
int xIndex = blockIdx.x*TILE_DIM + threadIdx.x;
int yIndex = blockIdx.y*TILE_DIM + threadIdx.y;
int index_in = xIndex + width * yIndex;
int index_out = yIndex + height * xIndex;
for (int r=0; r < nreps; r++)
{
for (int i=0; i<TILE_DIM; i+=BLOCK_ROWS)
{
odata[index_out+i] = idata[index_in+i*width];
}
}
}
Like talonmies had point out you can specify if you want operate the matrix as transposed or not, in cublas matrix operations eg.: for cublasDgemm() where C = a * op(A) * op(B) + b * C, assuming you want to operate A as transposed (A^T), on the parameters you can specify if it is ('N' normal or 'T' transposed)
就像talonmies指出的那样,你可以指定如果你想要把矩阵作为转置,在cublas矩阵运算中。:对于cublasDgemm(), C = a * op(a) * op(B) + B * C,假设你想操作a作为转置(a),在参数上你可以指定它是否为('N' normal '或'T' transposed)
#2
8
as asked within the title, to transpose a device row-major matrix A[m][n], one can do it this way:
正如题目中所要求的,要转置一个设备行-主矩阵a [m][n],可以这样做:
float* clone = ...;//copy content of A to clone
float const alpha(1.0);
float const beta(0.0);
cublasHandle_t handle;
cublasCreate(&handle);
cublasSgeam( handle, CUBLAS_OP_T, CUBLAS_OP_N, m, n, &alpha, clone, n, &beta, clone, m, A, m );
cublasDestroy(handle);
And, to multiply two row-major matrices A[m][k] B[k][n], C=A*B
并且,要将两个行矩阵A[m][k] B[k], C=A*B相乘。
cublasSgemm( handle, CUBLAS_OP_N, CUBLAS_OP_N, n, m, k, &alpha, B, n, A, k, &beta, C, n );
where C is also a row-major matrix.
其中C也是一个行主矩阵。
#3
4
The version of CUBLAS bundled with the CUDA 5 toolkit contains a BLAS-like method (cublasgeam) that could be used to transpose a matrix. It's documented here.
CUBLAS与CUDA 5工具包捆绑在一起的版本中包含了一个类似bla的方法(cublasgeam),可以用来转置一个矩阵。这里的记录。