Remix.run Logo
gr__or 4 days ago

Text surely is a hill, but I believe it's a local one, we got stuck on due to our short-sighted inability to go into a valley for a few miles until we find the (projectional) mountain.

All of your examples work better for code with structural knowledge:

- grep: symbol search (I use it about 100x as often as a text grep) or https://github.com/ast-grep/ast-grep

- diff: https://semanticdiff.com (and others), i.e.: hide noisy syntax only changes, attempt to capture moved code. I say attempt, because with projectional programming we could have a more expressive notion of code being moved

- sed: https://npmjs.com/package/@codemod/cli

- version control: I'd look towards languages like Unison to see what funky things we could do here, especially for libraries. A general example: no conflicts due to non-semantic changes (re-orderings, irrelevant whitespaces, etc.)

zokier 4 days ago | parent | next [-]

But as the tools you link demonstrate, having "text" as the on-disk format does not preclude AST based (or even smarter) tools. So there is little benefit in having non-text format. Ultimately it's all just bytes on disk

gr__or 3 days ago | parent | next [-]

Even that is not without its cost. Most of these tools are written in different languages, which all have to maintain their own parsers, which have to keep up with language changes.

And there are abilities we lose completely by making text the source of truth, like a reliable version control for "this function moved to a new file".

theamk 3 days ago | parent [-]

At least the parsers are optional now - you can still grep, diff, etc.. even if your tools have no idea about language's semantics.

But if you store ASTs, you _have_ to have the support of each of the language for each of the tools (because each language has its own AST). This basically means a major chicken-and-egg problem - a new language won't be compatible with any of the tools, so the adoption will be very low until the editor, diff, sed etc.. are all updated.. and those tools won't be updated until the language is popular.

And you still don't get any advantages over text! For example, if you really cared about "this function moved to new file" functionality, you could have unique id after each function ("def myfunc{f8fa2bdd}..."), and insert/hide them in your editor. This way the IDE can show nice definition, but grep/git etc.. still work but with extra noise.

In fact, I bet that any technology that people claim requires non-readable AST files, can be implemented as text for many extra upsides and no major downsides (with the obvious exception of truly graphical things - naive diffs on auto-generated images, graphs or schematics files are not going to be very useful, no matter what kind of text format is used)

Want to have each person see it's own formatting style? Reformat to person's style on load and format back to project style on save. Modern formatters are so fast, people won't even notice this.

Want fast semantic search? Maintain the binary cache files, but use text as source-of-truth.

Want better diff output? Same deal, parse and cache.

Want to have no files, but instead have function list and edit each one directly, a la Smalltalk? Maintain files transparently with text code - maybe one file per function, or one file per class, or one per project...

The reason people keep source code as text as it's really a global maximum. The non-text format gives you a modest speedup, but at the expense of imposing incredible version compatibility pain.

gr__or 3 days ago | parent [-]

The complexity of a parser is orders of magnitude higher than that of an AST schema.

I'm also not saying we can have all these good things, but they are not free, and the costs are more spread out and thus less obviously noticeable than the ones projectional code imposes.

theamk 3 days ago | parent [-]

Are you talking about runtime complexity or programming-time complexity?

If the runtime, then I bet almost no one will notice, especially if the appropriate caching is used.

If the programming-time - sure, but it's not like you can avoid parsers altogether. If the parsers are not in the tools, they must be in IDE. Factor out that parsing logic, and make it a library all the tools can use (or a one-shot LSP server if you are in the language that has hard-to-use bindings).

Note even with AST-in-file approach, you _still_ need the library to read and write that AST, it's not like you can have a shared AST schema for multiple languages. So either way, tools like diff will need to have a wide variety of libraries linked in, one for each language they support. And at that point, there is not much difference between AST reader and code parser.

gr__or 3 days ago | parent [-]

I meant programming-time, but runtime is also a good point.

Cross-language libraries don't seem to be super common for this. The recovering-sense-from-text tools I named all use different parsers in their respective languages.

Again, reading (and yes, technically that's also parsing) from an AST from a data-exchange formatted file is mags simpler. And for parsing these schemes there are battle-tested cross-language solutions, e.g. protobuf.

rafaelmn 3 days ago | parent | prev [-]

Why even have a database - let's just keep the data in CSVs, we can grep it easily, it's all bytes on a disk.

gorgoiler 3 days ago | parent | prev | next [-]

I feel it’s important to stick up for the difference between text and code. The two overlap a lot, but not all text is code, even if most code is text.

It’s a really subtle difference but I can’t quite put my finger on why it is important. I think of all the little text files I’ve made over the decades that record information in various different ways where the only real syntax they share is that they use short lines (80 columns) and use line orientation for semantics (lah-dee-dah way of saying lots of lists!)

I have a lot of experience of being firmly ensconced in software engineering environments where the only resources being authored and edited were source code files.

But I’ve also had a lot of experience of the kind of admin / project / clerical work where you make up files as you go along. Teaching in a high school was a great place to practice that kind of thing.

kelseyfrog 3 days ago | parent | prev | next [-]

Thank you for your response. Conveniently, we can use an existing example - Clang's pch files. Could you walk me through using grep, diff, sed, and git on pch? I'd really appreciate it.

jrochkind1 3 days ago | parent | prev | next [-]

So there was an era, as the OP says, where your arguments were popular and believed and it was understood that things would move in this direction.

And yet it didn't, it reversed. I think the fact that "plain text for all source files" actually won in the actual ecosystem wasn't just because too many developers had the wrong idea/short-sightedness -- because in fact most influential people wanted and believed in what you say. It's because there are real factors that make the level of investment required for the other paths unsustainable, at least compared to the text source path.

it's definitely related to the "victory" of unix and unix-style OSs. Which is often understood as the victory of a philosophy of doing it cheaper, easier, simpler, faster, "good enough".

It's also got to do with how often languages and platforms change -- both change within a language/platform and languages/platforms rising and falling. Sometimes I wish this was less quick, I'm definitely a guy who wants to develop real expertise with a system by using it over a long time, and think you can work so much more effectively and productively when you have done such. But the actual speed of change of platforms and languages we see depends on reduced cost of tooling.

gr__or 3 days ago | parent [-]

For me, that's what "short-sighted inability" means. The business ecosystem we have does not have the attention span for this kind of project. What we need is individuals grouping together against the gradient of incentives (which is hard indeed).

Tooster 4 days ago | parent | prev [-]

I’d also add:

* [Difftastic](https://difftastic.wilfred.me.uk/) — my go-to diff tool for years * [Nu shell](https://www.nushell.sh/) — a promising idea, but still lacking in design/implementation maturity

What I’d really like to see is a *viable projectional editor* and a broader shift from text-centric to data-centric tools.

The issue is that nearly everything we use today (editors, IDEs, coreutils) is built around text, and there’s no agreed-upon data interchange format. There have been attempts (Unison, JetBrains MCP, Nu shell), but none have gained real traction.

Rare “miracles” like the C++ --> Rust migration show paradigm shifts can happen. But a text → projectional transition would be even bigger. For that to succeed, someone influential would need to offer a *clear, opt-in migration path* where:

* some people stick with text-based tools, * others move to semantic model editing, * and both can interoperate in the same codebase.

What would be needed:

* Robust, data-native alternatives to [coreutils](https://wiki.archlinux.org/title/Core_utilities) operating directly on structured data (avoid serialize ↔ parse boundaries). Learn from Nushell’s mistakes, and aim for future-compatible, stable, battle-tested tools. * A more declarative-first mindset. * Strong theoretical foundations for the new paradigm. * Seamless conversion between text-based and semantic models. * New tools that work with mainstream languages (not niche reinventions), and enforce correctness at construction time (no invalid programs). * Integration of semantic model with existing version control systems * Shared standards for semantic models across languages/tools (something on the scale of MCP or LSP — JetBrains’ are better, but LSP won thanks to Microsoft’s push). * Dual compatibility in existing editors/IDEs (e.g. VSCode supporting both text files and semantic models). * Integrate knowledge across many different projects to distill the best way forward -> for example learn from Roslyn's semantic vs syntax model, look into tree sitter, check how difftastic does tree diffing, find tree regex engines, learn from S-expressions and LISP like languages, check unison, adopt helix editor/vim editing model, see how it can eb integrated with LSP and MCP etc.

This isn’t something you can brute-force — it needs careful planning and design before implementation. The train started on text rails and won’t stop, so the only way forward is to *build an alternative track* and make switching both gradual and worthwhile. Unfortunately it is pretty impossible to do for an entity without enough influence.

zokier 4 days ago | parent [-]

But almost every editor worth its salt these days has structural editing.

https://docs.helix-editor.com/syntax-aware-motions.html

https://www.masteringemacs.org/article/combobulate-structure...

https://zed.dev/blog/syntax-aware-editing

Etc etc.

Tooster 3 days ago | parent [-]

And that's a great thing! I look forward to them being more mature and more widely adopted, as I have tried both zed and helix, and for the day to day work they are not yet there. For stuff to take traction though. Both of them, however, don't intend to be projectional editors as far as I am aware. For vims or emacs out there - I don't think they mainstream tools which can tip the scale. Even now vim is considered a niche, quirky editor with very high barrier of entry. And still, they operate primarily on text.

Without tools in mainstream editors I don't see how it can push us forward instead of saying a niche barely anyone knows about.