Remix.run Logo
mtrovo 4 days ago

The section on how they solved arrays is fascinating and terrifying at the same time https://blog.cloudflare.com/capnweb-javascript-rpc-library/#....

> .map() is special. It does not send JavaScript code to the server, but it does send something like "code", restricted to a domain-specific, non-Turing-complete language. The "code" is a list of instructions that the server should carry out for each member of the array.

> But the application code just specified a JavaScript method. How on Earth could we convert this into the narrow DSL? The answer is record-replay: On the client side, we execute the callback once, passing in a special placeholder value. The parameter behaves like an RPC promise. However, the callback is required to be synchronous, so it cannot actually await this promise. The only thing it can do is use promise pipelining to make pipelined calls. These calls are intercepted by the implementation and recorded as instructions, which can then be sent to the server, where they can be replayed as needed.

mdasen 4 days ago | parent | next [-]

In C#, there's expression trees which handle things like this and it's how Entity Framework is able to convert the lambdas it's given into SQL. This means that you can pass around code that can be inspected or transformed instead of being executed. Take this EntityFramework snippet:

    db.People.Where(p => p.Name == "Joe")
`Where` takes an `Expression<Func<T, bool>> predicate`. It isn't taking the `Func` itself, but an `Expression` of it so that it can look at the code rather than execute it. It can see that it's trying to match the `Name` field to the value "Joe" and translate that into a SQL WHERE clause.

Since JS doesn't have this, they have to pass in a special placeholder value and try to record what the code is doing to that value.

squirrellous 4 days ago | parent | next [-]

Is there anything C# _doesn’t_ have? :-)

It feels like C# has an answer to every problem I’ve ever had with other languages - dynamic loading, ADTs with pattern matching, functional programming, whatever this expression tree is, reflection, etc etc. Yet somehow it’s still a niche language that isn't widely used (outside of particular ecosystems).

rjbwork 4 days ago | parent | next [-]

It's one of the most widely used languages out there actually. But it's primarily used at buttoned up and boring SMB's/enterprise backoffices. We're not out here touting our new framework of the month to kafloogle the whatzit. We're just building systems with a good language and ecosystem that's getting better every year.

I've worked only at startups/small businesses since I graduated university and it's all been in C#.

brainzap 4 days ago | parent [-]

getting better? many packages we been using did a license swap on us xD

fucking nice ecosystem

rjbwork 3 days ago | parent [-]

Fork it. End of the day some guys decided they wanted to make money and the corporations profiting off their labor weren't paying up. These things don't happen in a vacuum. Does your company have a multi-thousand dollar a year budget to make sure your dependencies are sustainable?

garblegarble 3 days ago | parent | prev | next [-]

>Is there anything C# _doesn’t_ have?

You were maybe already getting at it, but as a kitchen sink language the answer is "simplicity". All these diverse language features increase cognitive load when reading code, so it's a complexity/utility tradeoff

vaylian 4 days ago | parent | prev | next [-]

It's funny how C# started out as a Java clone and then added a ton of features while Java stayed very conservative with new language features. And both languages are fine.

dominicrose 3 days ago | parent | prev | next [-]

As someone who dislikes clutter, in my experience it's just easier to read and write with these languages: Perl, PHP, Ruby, Python, Javascript, Smalltalk.

If you dare leave the safety of a compiler you'll find that Sublime Merge can still save you when rewriting a whole part of an app. That and manual testing (because automatic testing is also clutter).

If you think it's more professional to have a compiler I'd like to agree but then why did I run into a PHP job when looking for a Typescript one? Not an uncommon unfolding of events.

FungalRaincloud 3 days ago | parent [-]

I'm a bit surprised that you put PHP in that list. My current workload is in it, and a relatively modern version of it, so maybe that surprise will turn around soon, but I've always felt that PHP was more obnoxious than even C to read and write.

Granted, I started out on LISP. My version of "easy to read and write" might be slightly masochistic. But I love Perl and Python and Javascript are definitely "you can jump in and get shit done if you have worked in most languages. It might not be idiomatic, but it'll work"...

dominicrose 2 days ago | parent [-]

PHP is easy to get into because of the simple (and tolerant) syntax and extremely simple static typing system. The weak typing also means it's easier for beginners.

It does require twice the lines of PHP code to make a Ruby or Python program equivalent, or more if you add phpdoc and static types though, so it is easier to read/write Ruby or Python, but only after learning the details of the language. Ruby's syntax is very expressive but very complex if you don't know it by heart.

gwbas1c 3 days ago | parent | prev | next [-]

Good abstractions around units (Apologies if there is a specific terminology that I should use.)

Specifically, I'd like to be able to have "inches" as a generic type, where it could be an int, long, float, double. Then I'd also like to have "length" as a generic type where it could be inches as a double, millimeters as a long, ect, ect.

I know they added generic numbers to the language in C# 7, so maybe there is a way to do it?

evntdrvn 3 days ago | parent [-]

Check out F# "units of measure" ;)

4 days ago | parent | prev | next [-]
[deleted]
uzerfcwn 4 days ago | parent | prev [-]

> Is there anything C# _doesn’t_ have?

Pi types, existential types and built-in macros to name a few.

moomin 4 days ago | parent [-]

Sum types are the ones I really miss. The others would be nice but processing heterogeneous streams is my biggest practical issue.

drysart 4 days ago | parent | prev | next [-]

There are inherent limitations with the "execute it once and see what happens" approach; namely that any conditional logic that might be in the mapping function is going to silently get ignored. For example, `db.people.map(p => p.IsPerson ? (p.FirstName + ' ' + p.LastName) : p.EntityName)` would either be seen as reading `(IsPerson, FirstName, LastName)` or `(p.IsPerson, p.EntityName)` depending on the specific behavior of the placeholder value ... and neither of those sets is fully correct.

I wonder why they don't just do `.toString()` on the mapping function and then parse the resulting Javascript into an AST and figure out property accesses from that. At the very least, that'd allow the code to properly throw an error in the event the callback contains any forbidden or unsupported constructs.

kentonv 4 days ago | parent | next [-]

The placeholder value is an RpcPromise. Which means that all its properties are also RpcPromises. So `p.IsPerson` is an RpcPromise. I guess that's truthy, so the expression will always evaluate to `(p.FirstName + ' ' + p.LastName)`. But that's going to evaluate to '[object Object] [object Object]'. So your mapper function will end up not doing anything with the input at all, and you'll get back an array full of '[object Object] [object Object]'.

Unfortunately, "every object is truthy" and "every object can be coerced to a string even if it doesn't have a meaningful stringifier" are just how JavaScript works and there's not much we can do about it. If not for these deficiencies in JS itself, then your code would be flagged by the TypeScript compiler as having multiple type errors.

drysart 4 days ago | parent | next [-]

Yeah I'll definitely chalk this up to my not having more than a very very passing idea of the API surface of your library based on a quick read over just the blog post.

On a little less trivial skim over it looks like the intention here isn't to map property-level subsets returned data (e.g., only getting the `FirstName` and `LastName` properties of a larger object); as much as it is to do joins and it's not data entities being provided to the mapping function but RpcPromises so individual property values aren't even available anyway.

So I guess I might argue that map() isn't a good name for the function because it immediately made me think it's for doing a mapping transformation and not for basically just specifying a join (since you can't really transform the data) since that's what map() can do everywhere else in Javascript. But for all I know that's more clear when you're actually using the library, so take what I think with a heaping grain of salt. ;)

Aeolun 4 days ago | parent | prev | next [-]

Couldn’t you make this safer by passing the map something that’s not a plain JS function? I confess to that being the only thing that had me questioning the logic. If I can express everything, then everything should work. If it’s not going to work, I don’t want to be able to express it.

kentonv 4 days ago | parent [-]

I think any other syntax would likely be cumbersome. What we actually want to express here is function-shaped: you have a parameter, and then you want to substitute it into one or more RPC calls, and then compute a result. If you're going to represent that with a bunch of data structures, you end up with a DSL-in-JSON type of thing and it's going to be unwieldy.

pcthrowaway 4 days ago | parent [-]

I suspect there is prior work to draw from that could make this feasible for you... Have a look at how something like MongoDB handles conditional logic for example.

kentonv 3 days ago | parent [-]

Yeah that's what I mean by DSL-in-JSON. I think it's pretty clunky. It's also (at least in Mongo's formulation, at least when I last used it ~10 years ago) very vulnerable to query injection.

skybrian 4 days ago | parent | prev [-]

Another way to screw this up would be to have an index counter and do something different based on the index. I think the answer is "don't do that."

kentonv 4 days ago | parent [-]

Hmm, but I should make it so the map callback can take the index as the second parameter probably. Of course, it would actually be a promise for the index, so you couldn't compute on it, but there might be other uses...

kentonv 4 days ago | parent | prev [-]

> I wonder why they don't just do `.toString()` on the mapping function and then parse the resulting Javascript into an AST and figure out property accesses from that.

That sounds incredibly complicated, and not something we could do in a <10kB library!

actionfromafar 4 days ago | parent | next [-]

Maybe Fabrice Bellard could spare an afternoon.

sonthonax 4 days ago | parent | prev [-]

To the contrary, a simple expression language is one of those things that can easily be done in that size.

kentonv 3 days ago | parent [-]

But the suggestion wasn't to design a simple expression language.

The suggestion was to parse _JavaScript_. (That's what `.toString()` on a function does... gives you back the JavaScript.)

keyle 4 days ago | parent | prev | next [-]

C#, Swift, Dart, Rust... Python. Many languages take lambda/predicate/closure as filter/where.

It generally unrolls as a `for loop` underneath, or in this case LINQ/SQL.

C# was innovative for doing it first in the scope of SQL. I remember the arrival of LINQ... Good times.

4 days ago | parent | next [-]
[deleted]
sobani 4 days ago | parent | prev [-]

How many of those languages can take an expression instead of a lambda?

Func<..> is lambda that can only be invoked.

Expression<Func..>> is an AST of a lambda that can be transformed by your code/library.

Tyr42 3 days ago | parent [-]

R let's you do that, and it gets used by the tidy verse libraries to do things like change the scope variables in the functions are looked up in.

notpushkin 4 days ago | parent | prev | next [-]

PonyORM does something similar in Python:

    select(c for c in Customer if sum(c.orders.total_price) > 1000)
I love the hackiness of it.
porridgeraisin 4 days ago | parent [-]

PonyORM is my favourite python ORM.

Along with https://pypi.org/project/pony-stubs/, you get decent static typing as well. It's really quite something.

rafaelgoncalves 4 days ago | parent [-]

whoa, didn't know PonyORM, looks really neat! thanks for showing

javier2 4 days ago | parent | prev [-]

It dont think C# looks at the code? I suspect it can track that you called p.Name, then generate sql with this information?

ziml77 4 days ago | parent | next [-]

The C# compiler is looking at that code. It sees that the lambda is being passed into a function which accepts an Expression as a parameter, so it compiles the lambda as an expression tree rather than behaving like it's a normal delegate.

adzm 4 days ago | parent | prev [-]

The lambda is converted into an Expression, basically a syntax tree, which is then analyzed to see what is accessed.

javier2 3 days ago | parent [-]

Ok, so its a step in the compile that rewrites and analyzes it?

adzm 2 days ago | parent [-]

It's actually really neat. Normally a lambda would just be a function but if it gets cast to an Expression then you get the AST at runtime. Entity Framework analyzes these at runtime to generate the SQL and logic/mapping etc. You can get the AST at compile time in some situations with a Roslyn plug-in or etc I believe as well.

samwillis 4 days ago | parent | prev | next [-]

This record and replay trick is very similar to what I recently used to implement the query DSL for Tanstack DB (https://tanstack.com/db/latest/docs/guides/live-queries). We pass a RefProxy object into the where/select/join callbacks and use it to trace all the props and expressions that are performed. As others have noted you can't use js operators to perform actions, so we built a set of small functions that we could trace (eq, gt, not etc.). These callbacks are run once to trace the calls and build a IR of the query.

One thing we were surprisingly able to do is trace the js spread operation as that is a rare case of something you can intercept in JS.

Kenton, if you are reading this, could you add a series of fake operators (eq, gt, in etc) to provide the capability to trace and perform them remotely?

kentonv 4 days ago | parent | next [-]

Yes, in principle, any sort of remote compute we want to support, we could accomplish by having a custom function you have to call for it. Then the calls can be captured into the record.

But also, apps can already do this themselves. Since the record/replay mechanism already intercepts any RPC calls, the server can simply provide a library of operations as part of its RPC API. And now the mapper callback can take advantage of those.

I think this is the approach I prefer: leave it up to servers to provide these ops if they want to. Don't extend the protocol with a built-in library of ops.

samwillis 4 days ago | parent [-]

Ah, yes, obviously. This is all very cool!

IshKebab 3 days ago | parent | prev | next [-]

As I understand it it's basically how Pytorch works. A clever trick but also super confusing because while it seems like normal code, as soon as you try and do something that you could totally do in normal code it doesn't work:

  let friendsWithPhotos = friendsPromise.map(friend => {
    return {friend, photo: friend.has_photo ? api.getUserPhoto(friend.id) : default_photo};
  }
Looks totally reasonable, but it's not going to work properly. You might not even realise until it's deployed.
svieira 4 days ago | parent | prev [-]

Just a side note - reading https://tanstack.com/db/latest/docs/guides/live-queries#reus... I see:

    const isHighValueCustomer = (row: { user: User; order: Order }) => 
     row.user.active && row.order.amount > 1000
But if I'm understanding the docs correctly on this point, doesn't this have to be:

    const isHighValueCustomer = (row: { user: User; order: Order }) => 
      and(row.user.active, gt(row.order.amount, 1000))
samwillis 4 days ago | parent [-]

Yep, that's an error in the docs..

mdavidn 4 days ago | parent | prev | next [-]

Reminds me of Temporal.io "workflows," which are functions that replace the JSON workflow definitions of AWS Step Functions. If a workflow function's execution is interrupted, Temporal.io expects to be able to deterministically replay the workflow function from the beginning, with it yielding the same sequence of decisions in the form of callbacks.

miki123211 4 days ago | parent | prev | next [-]

I kind of want to see an ORM try something like this.

You can't do this in most languages because of if statements, which cannot be analyzed in that way and break the abstraction. You'd either need macro-based function definitions (Lisp, Elixir), bytecode inspection (like in e.g. Pytorch compile), or maybe built-in laziness (Haskell).

Edit: Or full object orientation like in Smalltalk, where if statements are just calls to .ifTrue and .ifFalse on a true/false object, and hence can be simulated.

spankalee 4 days ago | parent | prev | next [-]

I presume conditionals are banned - sort of like the rules of hooks - but how?

kentonv 4 days ago | parent [-]

The input to the map function (when it is called in "record" mode on the client) is an RpcPromise for the eventual value. That means you can't actually inspect the value, you can only queue pipelined calls on it. Since you can't inspect the value, you can't do any computation or branching on it. So any computation and branching you do perform must necessarily have the same result every time the function runs, and so can simply be recorded and replayed.

The only catch is your function needs to have no side effects (other than calling RPC methods). There are a lot of systems out there that have similar restrictions.

btown 4 days ago | parent | next [-]

> your function needs to have no side effects

I'm trying to understand how well this no-side-effects footgun is defended against.

https://github.com/cloudflare/capnweb/blob/main/src/map.ts#L... seems to indicate that if the special pre-results "record mode" call of the callback raises an error, the library silently bails out (but keeps anything already recorded, if this was a nested loop).

That catches a huge number of things like conditionals on `item.foo` in the map, but (a) it's quite conservative and will fail quite often with things like those conditionals, and (b) if I had `count += 1` in my callback, where count was defined outside the scope, now that's been incremented one extra time, and it didn't raise an error.

React Hooks had a similar problem, with a constraint that hooks couldn't be called conditionally. But they solved their DX by having a convention where every hook would start with `use`, so they could then build linters that would enforce their constraint. And if I recall, their rules-of-hooks eslint plugin was available within days of their announcement.

The problem with `map` is that there are millions of codebases that already use a method called `map`. I'd really, really love to see Cap'n Web use a different method name - perhaps something like `smartMap` or `quickMap` or `rpcMap` - that is more linter-friendly. A method name that doesn't require the linter to have access to strong typing information, to understand that you're mapping over the special RpcPromise rather than a low-level array.

Honestly, it's a really cool engineering solve, with the constraint of not having access to the AST like one has in Python. I do think that with wider adoption, people will find footguns, and I'd like this software to get a reputation for being resilient to those!

da25 4 days ago | parent [-]

This ! Using the same name, i.e. `.map()` is a footgun, that devs would eventually fumble upon. `rpcMap()` sounds good. cc: @kentonv

tonyg 4 days ago | parent | prev | next [-]

Is .map specialcased or do user functions accepting callbacks work the same way? Because you could do the Scott-Mogensen thing of #ifTrue:ifFalse: if so, dualizing the control-flow decision making, offering a menu of choices/continuations.

kentonv 4 days ago | parent [-]

.map() is totally special-cased.

For any other function accepting a callback, the function on the server will receive an RPC stub, which, when called, makes an RPC back to the caller, calling the original version of the function.

This is usually what you want, and the semantics are entirely normal.

But for .map(), this would defeat the purpose, as it'd require an additional network round-trip to call the callback.

qcnguy 4 days ago | parent | next [-]

What about filter? Seems useful also.

kentonv 4 days ago | parent [-]

I don't think you could make filter() work with the same approach, because it seems like you'd actually have to do computation on the result.

map() works for cases where you don't need to compute anything in the callback, you just want to pipeline the elements into another RPC, which is actually a common case with map().

If you want to filter server-side, you could still accomplish it by having the server explicitly expose a method that takes an array as input, and performs the desired filter. The server would have to know in advance exactly what filter predicates are needed.

svieira 4 days ago | parent | next [-]

But you might want to compose various methods on the server in order to filter, just like you might want to compose various methods on the server in order to transform. Why is `collection.map(server.lookupByInternalizedId)` a special case that doesn't require `server.lookupCollectionByInternalizedId(collection)`, but `collection.filter(server.isOperationSensibleForATuesday)` is a bridge too far and for that you need `server.areOperationsSensibleForATuesday(collection)`?

kentonv 4 days ago | parent [-]

I agree that, in the abstract, it's inconsistent.

But in the concrete:

* Looking up some additional data for each array element is a particularly common thing to want to do.

* We can support it nicely without having to create a library of operations baked into the protocol.

I really don't want to extend the protocol with a library of operations that you're allowed to perform. It seems like that library would just keep growing and add a lot of bloat and possibly security concerns.

(But note that apps can actually do so themselves. See: https://news.ycombinator.com/item?id=45339577 )

5Qn8mNbc2FNCiVV 2 days ago | parent | prev [-]

Couldn't this be done in some way when validation exists, that the same validation is used to create a "better" placeholder value that may be able to be used with specific conditional functions? (eq(), includes(), etc.)

svieira 4 days ago | parent | prev [-]

Doesn't this apply for _all_ the combinators on `Array.prototype` though? Why special-case `.map` only?

kentonv 4 days ago | parent [-]

See cousin comment: https://news.ycombinator.com/item?id=45338969

spankalee 4 days ago | parent | prev | next [-]

But you also can't close over anything right?

I did a spiritually similar thing in JS and Dart before where we read the text of the function and re-parsed (or used mirrors in Dart) to ensure that it doesn't access any external values.

kentonv 4 days ago | parent [-]

You actually CAN close over RPC stubs. The library will capture any RPC calls made during the mapper callback, even on stubs other than the input. Those stubs are then sent along to the server with the replay instructions.

fizx 4 days ago | parent | prev [-]

Also, your function needs to be very careful on closures. Date.toLocaleString and many other js functions will be different on client and server, which will also cause silent corruption.

kentonv 3 days ago | parent [-]

If you invoke `Date.toLocaleString()` in a map callback, it will consistently always run on the client.

fizx 3 days ago | parent [-]

I don't see how this very contrived example pipelines:

    client.getAll({userIds}).map((user) => user.updatedAt == new Date().toLocaleString() ? client.photosFor(user.id) : {})
or without the conditional,

    client.getAll({userIds}).map((user) => client.photos({userId: user.id, since: new Date(user.updatedAt).toLocaleString()})
Like it has to call toLocaleString on the server, no?
kentonv 3 days ago | parent [-]

Neither of these will type check.

You can't perform computation on a promise. The only thing you can do is pipeline on it.

`user.updatedAt == date` is trying to compare a promise against a date. It won't type check.

`new Date(user.updatedAt)` is passing a promise to the Date constructor. It won't type check.

4 days ago | parent | prev | next [-]
[deleted]
tobyhinloopen 4 days ago | parent | prev | next [-]

That's neat, tnx for pointing it out

__alexs 4 days ago | parent | prev [-]

So now I have yet another colour of function? Fun.