Formatting code should be unnecessary

Posted by MaxLeiter 5 days ago

Formatting code should be unnecessary(maxleiter.com)

350 points | 476 commentspage 2

rs186 5 days ago|

Ah, eslint-config-airbnb. My favorite airbnb config issues:

https://github.com/airbnb/javascript/issues/1271

https://github.com/airbnb/javascript/issues/1122

I literally spent over an hour when adapting an existing project to use the airbnb config, when code was perfectly correct, clear and maintainable. I ended up disabling those specific rules locally. I never used it in another project. (Looks like the whole project is no longer maintained. Good riddance.)

The airbnb config is, in my view, the perfect example of unnecessarily wasting people's productivity when linting is done badly.

lisper 5 days ago||

It never ceases to amaze me how many times people can essentially re-invent S-expressions without realizing that's what they are doing.

benrutter 5 days ago||

Scanning the comments waiting for a lisper to comment and found one!

I guess lisp still has whitespace? That seems like the only meaningful way it isn't already just what the post is describing.

Jach 5 days ago||

In actual Common Lisp development, code is stored in text files and edited and diffed as text in source controlled repositories. Once code is evaluated by an implementation, it's a different story, but before that there are many formatting options. It's mostly around where to put line breaks, whitespace, and parens, but still. The other day I wrote this simple function:

    (defun check-password-against-hash (password hash)
      (handler-case
        (bcrypt:password= password hash)
        (error () nil)))

There's already multiple choices on formatting (and naming, and other things) just from this sample.

In theory a system could be made where this level of code isn't what's actually stored and is just a reverse pretty-print-with-my-preferences version of the code, as the post mentions. SBCL compiles my function when I enter it, I can ask SBCL to describe it back to me:

    * (describe #'check-password-against-hash)
    #<FUNCTION CHECK-PASSWORD-AGAINST-HASH>
      [compiled function]
    
    Lambda-list: (PASSWORD HASH)
    Derived type: (FUNCTION (T T) *)
    Source form:
      (LAMBDA (PASSWORD HASH) (BLOCK CHECK-PASSWORD-AGAINST-HASH (HANDLER-CASE (CL-BCRYPT:PASSWORD= PASSWORD HASH) (ERROR NIL NIL))))

I can also ask SBCL to show me the disassembly, perhaps again in theory a system could be made where you can get and edit text at that level of abstraction before putting it back in.

    * (disassemble #'check-password-against-hash)
    ; disassembly for CHECK-PASSWORD-AGAINST-HASH
    ; Size: 308 bytes. Origin: #xB8018AA278                       ; CHECK-PASSWORD-AGAINST-HASH
    ; 278:       498B4510         MOV RAX, [R13+16]               ; thread.binding-stack-pointer
    ; 27C:       488945F8         MOV [RBP-8], RAX
    ; 280:       488965D8         MOV [RBP-40], RSP
    ; 284:       488D45B0         LEA RAX, [RBP-80]
    ; 288:       4D8B7520         MOV R14, [R13+32]               ; thread.current-unwind-protect-block
    ; 28C:       4C8930           MOV [RAX], R14
    ; ... and so on ....

(SBCL does actually let you modify the compiled code directly if you felt the urge to do such a thing. You just get a pointer to the given origin address and offset and write away.)

But just going back to the Lisp source form, it's close enough that you could recover the original and format it a few different ways depending on different preferences. e.g. someone might prefer the first expression given to handler-case to be on the same line instead of a new line like I did. But to such a person, is that preference universal, or does it depend on the specific expressions involved? There are other not strictly formatting preferences at play here too, like the use of "cl-bcrypt" vs "bcrypt" as package name, or one could arrange to have no explicit package name at all. My own preferences on both matters are context-sensitive. The closest universal preference I have around this general topic is that I really hate enforced format tools even if they bent to my specific desires 100% of the time.

I'd say the closest modern renditions of what the post is talking about are expressed by node editors. Unreal's Blueprints or Blender's shader editor are two examples, ETL tools are another. But people tend to work at the node level (and may have formatting arguments about the node layout) rather than a pretty-printed text representation of the same data. I think in the ETL world it's perhaps more common to go under the hood a little and edit some text representation, which may be an XML file (and XML can be pretty-printed for many different preferences) or a series of SQL statements or something CSV or INI like... whether or not that text is a 'canonical' representation or a projection would depend on the tool.

lisper 5 days ago||

> In actual Common Lisp development, code is stored in text files and edited and diffed as text in source controlled repositories.

That's true, but there is a very big difference between S-expressions stored as text and other programming languages stored as text because there is a standard representation of S-expressions as text, and Common Lisp provides functions that implement that standard in both directions (READ and PRINT) as part of its standard library. Furthermore, the standard ensures READ-PRINT equivalency, i.e. if you READ the result of PRINTing an object the result is an equivalent object. So there is a one-to-one mapping (modulo copying) between the text form and the internal representation. And, most importantly, the semantics of the language are defined on the internal representation and not the textual form. So if you wanted to store S-expressions in, say, a relational database rather than a text file, that would be an elementary exercise. This is why many CL implementations provide alternative serializations that can be rendered and parsed more efficiently than the standard one, which is designed to be human-readable.

This is in very stark contrast to nearly every other programming language, where the semantics are defined directly on the textual form. The language standard typically doesn't even require that an AST exist, let alone define a canonical form for it. Parsers for other languages are typically embedded deep inside compilers, and not provided as part of the standard library. Every one is bespoke, and they are often byzantine. There are no standard operations for manipulating an AST. If you want to write code that generates code, the output must be text, and the only way to run that code is to parse and compile it using the bespoke parser that is an opaque part of the language compiler. (Note that Python is a notable exception.)

whartung 5 days ago||

Its interesting that despite the utility of S-Expressions, as mentioned, semantic diff, for example, of CL code is uncommon.

By that I mean highlighting the diff between these:

  (dolist (i l)
    (print (car i)))

  (dolist (i l) (print (cdr i)))

With the diff highlighting the `car` changed to `cdr` rather than just the raw lines being changed.

I'm pretty sure this exists, but it's uncommon (at least to me its uncommon).

lisper 5 days ago||

It is uncommon because it turns out that text diff is good enough 99% of the time, especially if you follow normal formatting and indentation conventions.

Also, structural diff is actually a very hard problem.

mdaniel 5 days ago||

Wait until that Bablr user shows up to these threads, and then you'll really have to start drinking

conartist6 5 days ago||

Wow I am thoroughly honored. You are probably the first person ever who isn't me to bring it up in a thread.

I had never heard of DIANA but I love old ideas being new again. (Plus you made me laugh)

oftenwrong 5 days ago||

Storing an IR also means we can create languages beyond the limits of syntactical practicality. Imagine, for example, an entire comment/documentation dimension of the code. Instead of commenting on a line near some code, you could attach comments semantically to an expression, or to a variable, or to any unit of code.

hliyan 5 days ago|

Token, statement and block level annotations would actually be nice. Perhaps even nicer if those annotations could be structured data instead of just text. You could create a truly self-describing code base without having to worry too much about the second hardest problem in programming.

kesor 5 days ago||

This is how Chrome Dev Tools shows source code. The original is often minified or in whatever format the author left it. And when you check the "pretty" checkbox in dev tools, it shows up using whichever format Chrome developers decided it should look like.

shmerl 5 days ago||

You can't easily search / grep etc. an IR, unless you use some kind of reverse translator. Readable source files have their benefits in being simple in that sense.

marssaxman 5 days ago||

Imagine having to write a new diff tool for each language!

kesor 5 days ago||

You don't need a special grep for every language, you just need a tool that translates the mini version into the formatted version and back. Then you chain the tools, just like anything else in UNIX.

marssaxman 5 days ago||

Seems reasonable. Since you're likely to perform this translation more than once for any given file, it seems like it would be practical to cache the translated output, perhaps as a file on disk.

eviks 5 days ago||

> unless you use some kind of reverse translator

Would a few decades help in universally having such a translator in all the tools?

account42 5 days ago||

In all the tools? No, you'd need an infinite amount of time for that.

eviks 5 days ago||

The tools didn't require an infinite amount of time to write, why would it take infinity to change a format??? (but no not all the tools, just all the ones that are used)

TheAlchemist 5 days ago||

I like that. We should have something like this for python.

Black is great, but maybe it's just me since it aligns with how I like the code formatted.

Would there be any downsides for python (or git ?) to define a standard way of formatting to save a valid file, and all the formatting necessary to read a file happens in the IDE showing the file ?

That would very much fit with python ethos 'There should be one-- and preferably only one --obvious way to do it.'

benrutter 5 days ago|

I think the downside would mainly be complexity. As soon as you do that, you have to develop what that intermediate representation is and how it gets stored. But moreover, you'd need to develop workarounds for the fact that all external code infrastructure (version control, editors, command line tools) is built for text.

I can't see a crazy huge downside from a python point of view, but seems like a much bigger upside than flexible formatting would be needed to justify breaking from all of that stuff.

TheAlchemist 5 days ago||

I was thinking just plain python - let's say Black formatted code is the default and we commit only that. Then on the visualization side, the IDE can format it for whatever we want.

Actually, this could be a really easy feature for the IDE and could work already easily.

__MatrixMan__ 5 days ago||

Unison doesn't move the formatting choices further than the machine on which the code was written. The codebase only contains the AST.

Its such a cool idea, though I haven't spent much time using it in anger, so its hard to say if its a useful idea.

wonger_ 5 days ago||

Yeah, if any language has potential for AST source of truth instead of textual source of truth, it's Unison.

I'm just waiting for a breakthrough project to show that it's ready for wider adoption. Leaving text-based tooling is a big ask.

The principles behind Unison, for those who haven't read them yet: https://www.unison-lang.org/docs/the-big-idea/#richer-codeba...

> Each Unison definition is identified by a hash of its syntax tree.

oftenwrong 5 days ago||

Unison's immutable definitions also enable a bunch of compelling capabilities. No merge conflicts. Incremental everything: build, test, lint, distribution, rendering as formatted text, et cetera. Trivial to apply "hot" updates to running systems.

lordnacho 5 days ago||

Aren't most projects these days written in a mix of languages, most of them text? You'd have to get them to change to use the same tools we currently use, or else you'd have to use special tools. The beauty of the modern stack is the base tools are near universal.

If you want everyone to see their own preference of format, either write a script or get AI to format it for you.

ChrisMarshallNY 5 days ago||

I've heard that Google works [sort of] that way (don't know, myself). They have a lot of tools that allow devs to use what formatting they want, and it's made standard, during checkin.

I heard this, many years ago, when we used Perforce. The Perforce consultant that we dealt with, told us this, as an example of triggers. Back then, I was told that Google was a big Perforce shop (maybe just a part of Google. I dunno).

I have heard that this was one of the goals of developing IDLs. I think the vision was, that you could have a dozen different programmers, working in multiple languages (for example, C for the drivers, Haskell for the engine, and Lua for the UI). They would be converted to a common IDL, when submitted to configuration management, and then extracted from that, when the user looks at it.

I can't see that working, but a lot of stuff that I used to think was crazy, has happened, so, who knows?

yojo 5 days ago||

I can confirm that Google was using Perforce for version control extensively, at least through 2008. I think it was somehow customized, but I definitely have lingering muscle memory around “p4 sync” and “p4 submit”.

I was on an internal tools team doing distinctly unsexy LAMP-stack work, but all the documentation I ever saw talked about perforce/p4.

__loam 5 days ago||

Go was designed at Google with a built in style checker to explicitly address this and prevent bikeshedding.

laserbeam 5 days ago|

Reminds me of dion systems. A few years ago a group of devs was working on a programming environment that feels very close to what DIANA is describing.

The project is dead enough that they no longer own the TLD for the company. As far as I know, the only remnants of the project are youtube recordings of demos held at conferences.

More comments...