Remix.run Logo
grg0 3 days ago

So what exactly is the difference? Hard to parse the assembly with so many C++-isms, but I'm gonna guess that while the raw loop version is able to std::move transformed_data into response.data, the views version must traverse the view and copy the dumb way. There are some delete calls in the latter version that don't appear in the first one. I think a fairer comparison would be to not store a std::vector<ToData> in Response in the view version, but store the actual view. After all, the only reason there is a vector there to begin with is that the raw-loop version needed somewhere to store the result.

LegionMammal978 3 days ago | parent [-]

> I think a fairer comparison would be to not store a std::vector<ToData> in Response in the view version, but store the actual view.

That would be catastrophic, since the view is only storing a bunch of references to the (Widget widget) argument that gets deleted once the function returns. In general, you have to be wary regarding how long references are valid for.

Anyway, I'm not entirely sure what the purpose of the transformed_data vector is. If you're already going to put it into response.data, just clear it out and start adding elements directly. Or if that's infeasible, at least move it into response.data instead of copying it.

grg0 3 days ago | parent [-]

That widget argument should really be const Widget&. Then my statement above stands. I was going under the assumption that the entire response is some function of the input, and the input outlives everything else.

> I'm not entirely sure what the purpose of the transformed_data vector is.

I suppose that in the real application, they want something like 'response = f(g(h(...(input)...))', where f . g . h is some non-trivial composition of a bunch of view transforms. So then it kinda makes sense. Basically, construct a lazy list, then eval everything at the end and store the result somewhere.

If so, then that 'f . g . h' composition might compile to something faster than the equivalent of 3 loops and vectors for intermediate results, since the compiler can fuse the whole thing into a tight loop. They might have made the benchmark misleading by constructing a too-minimal example.