Remix.run Logo
lisper 4 days ago

It never ceases to amaze me how many times people can essentially re-invent S-expressions without realizing that's what they are doing.

benrutter 4 days ago | parent | next [-]

Scanning the comments waiting for a lisper to comment and found one!

I guess lisp still has whitespace? That seems like the only meaningful way it isn't already just what the post is describing.

Jach 4 days ago | parent [-]

In actual Common Lisp development, code is stored in text files and edited and diffed as text in source controlled repositories. Once code is evaluated by an implementation, it's a different story, but before that there are many formatting options. It's mostly around where to put line breaks, whitespace, and parens, but still. The other day I wrote this simple function:

    (defun check-password-against-hash (password hash)
      (handler-case
        (bcrypt:password= password hash)
        (error () nil)))
There's already multiple choices on formatting (and naming, and other things) just from this sample.

In theory a system could be made where this level of code isn't what's actually stored and is just a reverse pretty-print-with-my-preferences version of the code, as the post mentions. SBCL compiles my function when I enter it, I can ask SBCL to describe it back to me:

    * (describe #'check-password-against-hash)
    #<FUNCTION CHECK-PASSWORD-AGAINST-HASH>
      [compiled function]
    
    Lambda-list: (PASSWORD HASH)
    Derived type: (FUNCTION (T T) *)
    Source form:
      (LAMBDA (PASSWORD HASH) (BLOCK CHECK-PASSWORD-AGAINST-HASH (HANDLER-CASE (CL-BCRYPT:PASSWORD= PASSWORD HASH) (ERROR NIL NIL))))
I can also ask SBCL to show me the disassembly, perhaps again in theory a system could be made where you can get and edit text at that level of abstraction before putting it back in.

    * (disassemble #'check-password-against-hash)
    ; disassembly for CHECK-PASSWORD-AGAINST-HASH
    ; Size: 308 bytes. Origin: #xB8018AA278                       ; CHECK-PASSWORD-AGAINST-HASH
    ; 278:       498B4510         MOV RAX, [R13+16]               ; thread.binding-stack-pointer
    ; 27C:       488945F8         MOV [RBP-8], RAX
    ; 280:       488965D8         MOV [RBP-40], RSP
    ; 284:       488D45B0         LEA RAX, [RBP-80]
    ; 288:       4D8B7520         MOV R14, [R13+32]               ; thread.current-unwind-protect-block
    ; 28C:       4C8930           MOV [RAX], R14
    ; ... and so on ....
(SBCL does actually let you modify the compiled code directly if you felt the urge to do such a thing. You just get a pointer to the given origin address and offset and write away.)

But just going back to the Lisp source form, it's close enough that you could recover the original and format it a few different ways depending on different preferences. e.g. someone might prefer the first expression given to handler-case to be on the same line instead of a new line like I did. But to such a person, is that preference universal, or does it depend on the specific expressions involved? There are other not strictly formatting preferences at play here too, like the use of "cl-bcrypt" vs "bcrypt" as package name, or one could arrange to have no explicit package name at all. My own preferences on both matters are context-sensitive. The closest universal preference I have around this general topic is that I really hate enforced format tools even if they bent to my specific desires 100% of the time.

I'd say the closest modern renditions of what the post is talking about are expressed by node editors. Unreal's Blueprints or Blender's shader editor are two examples, ETL tools are another. But people tend to work at the node level (and may have formatting arguments about the node layout) rather than a pretty-printed text representation of the same data. I think in the ETL world it's perhaps more common to go under the hood a little and edit some text representation, which may be an XML file (and XML can be pretty-printed for many different preferences) or a series of SQL statements or something CSV or INI like... whether or not that text is a 'canonical' representation or a projection would depend on the tool.

lisper 3 days ago | parent [-]

> In actual Common Lisp development, code is stored in text files and edited and diffed as text in source controlled repositories.

That's true, but there is a very big difference between S-expressions stored as text and other programming languages stored as text because there is a standard representation of S-expressions as text, and Common Lisp provides functions that implement that standard in both directions (READ and PRINT) as part of its standard library. Furthermore, the standard ensures READ-PRINT equivalency, i.e. if you READ the result of PRINTing an object the result is an equivalent object. So there is a one-to-one mapping (modulo copying) between the text form and the internal representation. And, most importantly, the semantics of the language are defined on the internal representation and not the textual form. So if you wanted to store S-expressions in, say, a relational database rather than a text file, that would be an elementary exercise. This is why many CL implementations provide alternative serializations that can be rendered and parsed more efficiently than the standard one, which is designed to be human-readable.

This is in very stark contrast to nearly every other programming language, where the semantics are defined directly on the textual form. The language standard typically doesn't even require that an AST exist, let alone define a canonical form for it. Parsers for other languages are typically embedded deep inside compilers, and not provided as part of the standard library. Every one is bespoke, and they are often byzantine. There are no standard operations for manipulating an AST. If you want to write code that generates code, the output must be text, and the only way to run that code is to parse and compile it using the bespoke parser that is an opaque part of the language compiler. (Note that Python is a notable exception.)

whartung 3 days ago | parent [-]

Its interesting that despite the utility of S-Expressions, as mentioned, semantic diff, for example, of CL code is uncommon.

By that I mean highlighting the diff between these:

  (dolist (i l)
    (print (car i)))

  (dolist (i l) (print (cdr i)))
With the diff highlighting the `car` changed to `cdr` rather than just the raw lines being changed.

I'm pretty sure this exists, but it's uncommon (at least to me its uncommon).

lisper 3 days ago | parent [-]

It is uncommon because it turns out that text diff is good enough 99% of the time, especially if you follow normal formatting and indentation conventions.

Also, structural diff is actually a very hard problem.

mdaniel 4 days ago | parent | prev [-]

Wait until that Bablr user shows up to these threads, and then you'll really have to start drinking

conartist6 4 days ago | parent [-]

Wow I am thoroughly honored. You are probably the first person ever who isn't me to bring it up in a thread.

I had never heard of DIANA but I love old ideas being new again. (Plus you made me laugh)