Remix.run Logo
aw1621107 15 hours ago

I had always thought expression templates at the very least needed the optimizer to inline/flatten the tree of function calls that are built up. For instance, for something like x + y * z I'd expect an expression template type like sum<vector, product<vector, vector>> where sum would effectively have:

    vector l;
    product& r;
    auto operator[](size_t i) {
        return l[i] + r[i];
    }
And then product<vector, vector> would effectively have:

    vector l;
    vector r;
    auto operator[](size_t i) {
        return l[i] * r[i];
    }
That would require the optimizer to inline the latter into the former to end up with a single expression, though. Is there a different way to express this that doesn't rely on the optimizer for inlining?
menaerus 14 hours ago | parent [-]

Expression templates do not rely on optimizer since you're not dealing with the computations directly but rather expressions (nodes) through which you are deferring the computation part until the very last moment (when you have a fully built an expression of expressions, basically almost an AST). This guarantees that you get zero cost when you really need it. What you're describing is something keen of copy elision and function folding though inlining which is pretty much basics in any c++ compiler and happens automatically without special care.

aw1621107 14 hours ago | parent [-]

> since you're not dealing with the computations directly but rather expressions (nodes) through which you are deferring the computation part until the very last moment (when you have a fully built an expression of expressions, basically almost an AST).

Right, I understand that. What is not exactly clear to me is how you get from the tree of deferred expressions to the "flat" optimized expression without involving the optimizer.

Take something like the above example for instance - w = x + y * z for vectors w/x/y/z. How do you get from that to effectively

    for (size_t i = 0; i < w.size(); ++i) {
        w[i] = x[i] + y[i] * z[i];
    }
without involving the optimizer at all?
menaerus 2 hours ago | parent [-]

The example is false because that's not how you would write an expression template for given computation so the question being how is it that the optimizer is not involved is also not quite set in the correct context so I can't give you an answer for that. Of course that the optimizer is generally going to be involved, as it is for all the code and not the expression templates, but expression templates do not require the optimizer in the way you're trying to suggest. Expression templates do not rely on O1, O2 or O3 levels being set - they work the same way in O0 too and that may be the hint you were looking for.