Remix.run Logo
divan 4 days ago

> RPC is often accused of committing many of the fallacies of distributed computing. > But this reputation is outdated. When RPC was first invented some 40 years ago, async programming barely existed. We did not have Promises, much less async and await.

I'm confused. How is this a "protocol" if its core premises rely on very specific implementation of concurrency in a very specific language?

closeparen 4 days ago | parent | next [-]

"RPC" originally referred to a programming paradigm where remote calls looked just like any other method calls, and it might not even be any of the programmer's business whether they're implemented in-process or on another machine. This obviously required wire protocols, client and server libraries, etc. to implement.

There's been a renaissance in the tools, but now we mainly use them like "REST" endpoints with the type signatures of functions. Programming language features like Future and Optional make it easier to clearly delineate properties like "this might take a while" or "this might fail" whereas earlier in RPC, these properties were kind of hidden.

kiitos 4 days ago | parent [-]

mm, i think you're describing corba, not rpc in general

closeparen 4 days ago | parent | next [-]

CORBA is trippier than that. A client’s request could include elements not normally serializable, like callbacks. A server could provide an object in response to your query and then continue mutating it, with the mutations reflected (effectively) in your address space, without your knowledge or participation.

kiitos 2 days ago | parent | next [-]

I am not really sure what you're talking about

RPC is "remote procedure call", emphasis on "remote", meaning you always necessarily gonna be serializing/deserializing the information over some kind of wire, between discrete/different nodes, with discrete/distinct address spaces

a client request by definition can't include anything that can't be serialized, serialization is the ground truth requirement for any kind of RPC...

a server doesn't provide "an object" in response to a query, it provides "a response payload", which is at most a snapshot of some state it had at the time of the request, it's not as if there is any expectation that this serialized state is gonna be consistent between nodes

2 days ago | parent | next [-]
[deleted]
closeparen 2 days ago | parent | prev [-]

Nothing stops me from implementing a Thrift or gRPC handler that uses a field from the request to look up an object in a hashmap and then call one of its methods with data from the request. But a distributed object system will do this implicitly on my behalf, so that the programmer’s perspective is like passing objects (by reference) over the network.

kentonv 4 days ago | parent | prev [-]

That's exactly what Cap'n Web does...

4 days ago | parent | prev [-]
[deleted]
kentonv 4 days ago | parent | prev | next [-]

What do you mean? Async programming exists in tons of languages. Just off the top of my head, I've used async/await in JavaScript, C++, Python, Rust, C#, ...

Anyway, the point here is that early RPC systems worked by blocking the calling thread while performing the network request, which was obviously a terrible idea.

chao- 4 days ago | parent [-]

Reminds me of the old "MongoDB is Web Scale" series of comedy videos:

https://youtu.be/bzkRVzciAZg

Some friends and I still jokingly troll each other in the vein of these, interjecting with "When async programming was discovered in 2008...", or "When memory safe compiled languages were invented in 2012..." and so forth.

afiori 4 days ago | parent [-]

Async/await became ergonomic and widespread only recently, I am sure there were async systems in the '80 but for example nodejs focus on non blocking I/O changed how a lot of people thought about servers and concurrency (whether node was first is almost irrelevant)

Often when something is discovered or invented is far less influential[1] than when it jumps on and hype train.

[1] the discovery is very important for historical and epistemological reasons of course, rewriting the past is bad

frollogaston 4 days ago | parent | next [-]

It's not a programming paradigm shift, more of a change to how runtimes work. We want to avoid the overhead of kernel threads in servers, and async/await on top of an event loop is a convenient way to do that, like in JS, Rust, and now Python.

Meanwhile Go doesn't have async/await and never will because it doesn't need it; it does greenthreading instead. Java has that too now.

Either way, your code waits on IO like before and does other work while it waits. But instead of the kernel doing the context switching, your runtime does something analogous at a higher layer.

kentonv 4 days ago | parent [-]

I disagree that async/await is purely about avoiding overhead of kernel threads. Kernel threads are actually not that expensive these days. You can have a server with 10,000 threads, no problem.

The problem is synchronization becomes extremely hard to reason about. With event loop concurrency, each continuation (callback) becomes effectively a transaction, in which you don't need to worry about anything else modifying your state out from under you. That legitimately makes a lot of things easier.

The Cloudflare Workers runtime actually does both: There's a separate thread for each connection, but within each thread there's an event loop to handle all the concurrent stuff relating to that one connection. This works well because connections rarely need to interact with each other's state, but they need to mess with their own state constantly.

(Actually we have now gone further and stacked a custom green-threading implementation on top of this, but that's really a separate story and only a small incremental optimization.)

catern 4 days ago | parent | next [-]

I totally agree with your framing of the value of async/await, but could you elaborate more on why you think that this behavior (which I would call "cooperative concurrency") is important for (ocap?) RPC systems? It seems to me that preemptive concurrency also suffices to make RPC viable. Unless you just feel that preemptive concurrency is too hard, and therefore not workable for RPC systems?

kentonv 3 days ago | parent [-]

Almost all ocap systems seem to use event loops -- and many of the biggest ocap nerds I know are also the biggest event loop nerds I know. I'm not actually sure if this is a coincidence or if there's something inherent that makes it necessary to pair them.

But one thing I can't figure out: What would be the syntax for promise pipelining, if you aren't using promises to start with?

catern 3 days ago | parent [-]

>What would be the syntax for promise pipelining, if you aren't using promises to start with?

Oh, great point! That does seem really hard, maybe even intractable. That's definitely a reason to like cooperative concurrency, huh...

Just to tangent even further, but some ideas:

- Do it the ugly way: add an artificial layer of promises in an otherwise pre-emptive, direct-style language. That's just, unfortunately, quite ugly...

- Use a lazy language. Then everything's a promise! Some Haskell optimizations feel kind of like promise pipelining. But I don't really like laziness...

- Use iterator APIs; that's a slightly less artificial way to add layers of promises on top of things, but still weird...

- Punt to the language: build an RPC protocol into the language, and promise pipelining as a guaranteed optimization. Pretty inflexible, and E already tried this...

- Something with choreographic programming and modal-types-for-mobile-code? Such languages explicitly track the "location" of values, and that might be the most natural way to represent ocap promises: a promise is a remote value at some specific location. Unfortunately these languages are all still research projects...

frollogaston 4 days ago | parent | prev [-]

It's true that JS await is kinda like releasing a lock, but otherwise, you'd just use a mutex whenever you access shared state. Which is rare as you said, and also easy to enforce in various langs nowadays.

kentonv 4 days ago | parent [-]

I said that shared state between connections is rare, but shared state within a connection is extremely common. And there are still multiple concurrent things going on within that connection context, requiring some concurrency mechanism. Locking mutexes everywhere sounds like a nightmare to me.

frollogaston 4 days ago | parent [-]

Ah I see. Well that is typically just fan-out-fan-in like "run these 4 SQL queries and RPCs in parallel and collect responses," nothing too complicated since the shared resources like the DB handle are usually thread-safe. It works out fine in Go and Java, even though I have unrelated reasons to avoid Go.

branko_d 3 days ago | parent | next [-]

“Running 4 SQL queries in parallel” is not thread-safe if done in separate transactions, and on data that is not read-only.

If some other transaction commits at just the wrong time, it could change the result of some of these queries but not all. The results would not be consistent with each other.

frollogaston 3 days ago | parent [-]

Thread-safe just means that the threading by itself doesn't break anything. The race condition you're describing is outside this scope and would happen the same in a single-threaded event loop.

Btw if you really want consistent multi reads, some DBMSes support setting a read timestamp, but the common ones don't.

branko_d 3 days ago | parent [-]

> would happen the same in a single-threaded event loop

Well...if you implemented a relational DBMS server without using threads. To my knowledge, no such DBMS exists, so the distinction seems rather academic.

> Btw if you really want consistent multi reads, some DBMSes support setting a read timestamp, but the common ones don't.

Could you elaborate? I can't say I heard of that mechanism. Perhaps you are referring to something like Oracle flashback queries or SQL Server temporal tables?

Normally, I'd use MVCC-based "snapshot" transaction isolation for consistency between multiple queries, though they would need to be executed serially.

frollogaston 2 days ago | parent [-]

I was talking about the client side here, which is maybe a web backend. If it's using threads, at least the connection pool will be thread-safe. If it's event loop, N/A.

If you want to look at the DBMS itself, well typically there's a separate process per connection, but say it uses threading instead... It'd be thread-safe too. You aren't hitting UB by doing concurrent xacts.

Snapshot xact is what I was thinking about. Not sure about Oracle, but in Spanner they can be parallel.

kentonv 4 days ago | parent | prev [-]

The Cloudflare Workers runtime is 1000x more complicated than your average web application. :)

divan 4 days ago | parent | prev [-]

I actually hate async/await approach to concurrency and avoid it as much as I can.

My mental model is that it's a caller who decides how call should be executed (synchroniously or asynchroniously). Synchronious call is when caller waits till completion/error, asynchronious - is when caller puts the call in the background (whatever it means in that language/context) and handle return results later. CSP concurrency model [1] is the closest fit here.

It's not a property of the function to decide how the caller should deal with it. This frustration was partly described in the viral article "What color is your function?" [2], but my main rant about this concurrency approach is that it doesn't match well how we think and reason about concurrent processes, and requires mental cognitive gymnastics to reason about relatively simple code.

Seeing "async/await/Promises/Futures" being a justification of a "protocol" makes little sense to me. I can totally get that they reimagined how to do RPC with first-class async/await primitives, but that doesn't make it a network "protocol".

[1] https://en.wikipedia.org/wiki/Communicating_sequential_proce...

[2] https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

josephg 4 days ago | parent | next [-]

I love this about sel4. Sel4 defines a capability based API between processes, and the invoking functions have both synchronous and asynchronous variants. (Ie, send, sendAsync, recv, recvAsync, etc). How you want to use any remote function is up to you!

pests 4 days ago | parent | prev [-]

Can’t you just write everything default-async and then if you want sync behavior just await immediately?

afiori 4 days ago | parent [-]

that is terrible for performance and some operations have external requirements to be sync

4 days ago | parent | prev [-]
[deleted]