Remix.run Logo
Sparrow: C++20 Idiomatic APIs for the Apache Arrow Columnar Format(github.com)
35 points by tanelpoder 7 days ago | 17 comments
levzettelin 3 days ago | parent | next [-]

  // You are responsible for releasing the structure in the end
  arrow_array.release(&arrow_array);
This doesn't look like RAII. How is this idiomatic for C++20? Why do you have to pass a pointer to "this" again as an explicit argument.
rfoo 3 days ago | parent | next [-]

This is the extracted Arrow C data interfaces as documented in https://arrow.apache.org/docs/format/CDataInterface.html

It's not how you interact with the data in your own C++ code, it's for passing this data to other in-process consumers (libraries etc). While in the example it calls the release function, this is usually just passed to a downstream consumer and it's their responsibility to call it.

I agree that having such an example as the first one is confusing. Given that a large part of the point of Apache Arrow is passing data columnar data between libraries in different languages in memory, it makes some sense.

CyberDildonics 3 days ago | parent [-]

It's not how you interact with the data in your own C++ code, it's for passing this data to other in-process consumers (libraries etc). While in the example it calls the release function, this is usually just passed to a downstream consumer and it's their responsibility to call it.

This seems like a strange rationalizations when you don't need to have explicit release to be able to pass it to something else.

pjmlp 3 days ago | parent | prev [-]

RAII predates C++98, I was already used to it in Turbo C++ for MS-DOS, and is pity we need to keep advocating for it as something extraordinary.

ender341341 3 days ago | parent | next [-]

I think you're partly making the point for them, RAII has been idiomatic C++ since before c++ was standardized. It wasn't even idiomatic c++98 to be missing it, so to be missing it in c++20 library definitely still isn't.

pjmlp 3 days ago | parent [-]

Indeed, that is the point.

CyberDildonics 3 days ago | parent | prev [-]

This doesn't have anything to do with what they said, they didn't say RAII was new.

pjmlp 3 days ago | parent [-]

Might be misunderstood by others not skilled in C++ when reading,

> This doesn't look like RAII. How is this idiomatic for C++20?

CyberDildonics 3 days ago | parent [-]

You can try to be insulting if you want but if you could explain the connection I think you would have already.

pjmlp 3 days ago | parent [-]

I wasn't.

CyberDildonics 3 days ago | parent [-]

You weren't what? Who are you saying "isn't skilled in C++" here and why would that matter?

pjmlp 3 days ago | parent [-]

Those that by reading that sentence think RAII is something new in C++20.

CyberDildonics 2 days ago | parent [-]

Why would someone who knows what RAII and C++20 are end up thinking that?

mgaunard 3 days ago | parent | prev [-]

The official arrow implementation is already in C++11, not sure what the value proposition of this is.

jcelerier 3 days ago | parent | next [-]

from a quick glance, https://man-group.github.io/sparrow/basic_usage.html looks infinitely more like a library I want to use than https://arrow.apache.org/cookbook/cpp/basic.html

rfoo 3 days ago | parent | prev [-]

<rant>The official Arrow C++ implementation is just ergonomic warts, full of `const std::shared_ptr<T>&` bs. Trying to use it to manipulate data always give me headache telling apart WTH is an Array, ArrayData, Buffer, and the typed Array interfaces are barely usable. The original official Rust port inherited all the mis-designs too. On the Rust side someone created arrow2 [0] to fix it.</rant>

And I'm glad there's a good C++ impl too.

[0] https://github.com/jorgecarleitao/arrow2

mgaunard 3 days ago | parent [-]

that's because a given Arrow column is actually several arrays of arrays.

Array, ArrayData and Buffer map to different layers of the abstraction.