I was thinking about a simple SIMD class, that supports overloaded arithmetic operators +-*/
etc.
While implementing this as a class template to support different kinds of intrinsics, I noticed that there are some available that would do multiple operations at once (_mm_fmadd_ps
for multiplication and addition).
I wondered now if there is an relative sane way to still use the math-operator overloadsa * b + c -> madd( a , b , c )
instead of using normal free functionsadd( mul( a , b ) , c ) -> madd( a , b , c )
and use these newer intrinsics.
我正在考虑一个简单的SIMD类,它支持重载算术运算符+ - * /等。虽然实现它作为一个类模板来支持不同类型的内在函数,但我注意到有一些可用的同时会执行多个操作(_mm_fmadd_ps用于乘法和加法)。我现在想知道是否有一种相对理智的方式仍然使用数学运算符重载a * b + c - > madd(a,b,c)而不是使用正常的*函数add(mul(a,b),c) - > madd(a,b,c)并使用这些较新的内在函数。
So my question boils down to:
所以我的问题归结为:
- Is it possible to chain multiple (independent) function-calls to call only one specific function (general question, not SIMD related)?
- (when proxies are able to do this, are they worth it)?
(当代理能够做到这一点时,他们是否值得)?
- If not, whats a good way to go on api-design regarding a SIMD-container to provide normal operations and also be able to newer intrinsics?
- provide operator-overloads and free functions at the same time
- discard operator-overloads and only rely on free functions
同时提供操作员重载和*功能
丢弃运算符重载并且仅依赖于*函数
- Is the compiler allowed to fold intrinsics to automatically use the new ones where appropriate? (collapse
add( mul( a , b ) , c )
tomadd( a , b , c )
when intrinsics are available and/or already used for the needed version of intrinsics)
是否可以将多个(独立)函数调用链接到仅调用一个特定函数(一般问题,而不是SIMD相关)? (当代理能够做到这一点时,他们是否值得)?
如果没有,那么关于SIMD容器的api-design是一个很好的方法来提供正常的操作,还能够更新的内在函数吗?提供操作员重载和*功能同时丢弃操作员过载并且仅依赖于*功能
是否允许编译器折叠内部函数以在适当的情况下自动使用新的内部函数? (当内在函数可用和/或已经用于所需的内在函数版本时,将add(mul(a,b),c)折叠为madd(a,b,c))
1 个解决方案
#1
something like this (you'll probably want to use maximum optimiser settings):
这样的事情(你可能想要使用最大优化设置):
#include <iostream>
template<class Intrinsic>
struct optimised
{
using type = Intrinsic;
optimised(type v)
: _v (v)
{}
operator type&() {
return _v;
}
operator const type&() const {
return _v;
}
type _v;
};
// naiive implementation of madd
double madd(double a, double b, double c) {
std::cout << "madd(" << a << ", " << b << ", " << c << ")" << std::endl;
return (a * b) + c;
}
struct mul_result
{
mul_result(const double& a, const double&b)
: _a(a), _b(b)
{}
operator double() const {
return _a * _b;
}
const double &_a, &_b;
};
double operator+(const mul_result& ab, const double& c)
{
return madd(ab._a, ab._b, c);
}
mul_result operator*(const optimised<double>& a, const optimised<double>& b)
{
return mul_result(a, b);
}
using namespace std;
int main()
{
optimised<double> a = 3, b = 7, c = 2;
auto x = a * b + c;
cout << x << endl;
return 0;
}
expected output:
madd(3, 7, 2)
23
#1
something like this (you'll probably want to use maximum optimiser settings):
这样的事情(你可能想要使用最大优化设置):
#include <iostream>
template<class Intrinsic>
struct optimised
{
using type = Intrinsic;
optimised(type v)
: _v (v)
{}
operator type&() {
return _v;
}
operator const type&() const {
return _v;
}
type _v;
};
// naiive implementation of madd
double madd(double a, double b, double c) {
std::cout << "madd(" << a << ", " << b << ", " << c << ")" << std::endl;
return (a * b) + c;
}
struct mul_result
{
mul_result(const double& a, const double&b)
: _a(a), _b(b)
{}
operator double() const {
return _a * _b;
}
const double &_a, &_b;
};
double operator+(const mul_result& ab, const double& c)
{
return madd(ab._a, ab._b, c);
}
mul_result operator*(const optimised<double>& a, const optimised<double>& b)
{
return mul_result(a, b);
}
using namespace std;
int main()
{
optimised<double> a = 3, b = 7, c = 2;
auto x = a * b + c;
cout << x << endl;
return 0;
}
expected output:
madd(3, 7, 2)
23