Self-Documenting Code

Posted by tie-in 10/23/2024

Self-Documenting Code(lackofimagination.org)

72 points | 126 commentspage 2

tln 10/23/2024|

I find the comment at the end interesting

// Creates a user and returns the newly created user's id on success

Hmm, it returns an id? But the @returns is Promise<any>? The code as written will change when userService.create changes... without the actual, human readable bit of prose, that potential code issue could be easily overlooked.

Of course, here the code could have a newtype for UserId and return Promise<UserId>, making the code better and then the prose is basically not needed (but please just write a docstring).

FWIW I would document that the `user` parameter is modified. And document the potential race condition between checking the existence of a user and creating a user, and maybe why it was chosen to be done in this order (kinda flimsy in this example). Which would probably lead me to designing around these issues.

Trying to only document via self-documenting code seems to always omit nuances.

    /** Create a user and return the id, or throw an error with an appropriate code.
     * 
     * user.password may be changed after this function is called.
     */
    async function createUser(user: User): Promise<number> {
        if (!validateUserInput(user)) {
            throw new Error(err.userValidationFailed);
        }

        if (isPasswordValid(user.password)) {
            // Check now if the user exists, so we can throw an error before hashing the password.
            // Note: if a user is created in the short time between this check and the actual creation, 
            // there could be an unfriendly error
            const userExists = !!(await userService.getUserByEmail(user.email));
            if (userExists) {
                throw new Error(err.userExists);
            }
        } else {
            throw new Error(err.invalidPassword);
        }

        user.password = await hashPassword(user.password);
        return userService.create(user);
    }

Savageman 10/23/2024|

This, but move isPasswordValid below if (!isUserValid)

gnarlouse 10/23/2024||

Having a function throwError makes me squirm.

`isValid() || throwError()` is an abuse of abstraction

t-writescode 10/23/2024|

A fail fast paradigm is a style of programming that can be used very effectively, as long as it's understood that's the style of code that is being written.

Much of my code, for example, is fail fast, and then I have error handling, supervisors, etc, at a level that can log and restart the work.

Izkata 10/25/2024||

Not in anything touching user or password validation. It's probably fairly safe the way it's used in this function, but I'd avoid it as a rule of thumb to not accidentally introduce some sort of timing attack.

I'm a little disappointed no one mentioned this even as a side comment, since there's a whole section on doing short-circuit evaluation, even if it's not in a way that would cause this kind of problem.

jnsie 10/24/2024||

I lived in the C# world for a while and our style guides mandated that we use those JSDoc style comments for every function definition. I loathed them. They invariable became a more verbose and completely redundant version of the function definition. Developers even used a tool (GhostDoc, IIRC) to generate these comments so that CreateNewUser() became // Create New User. Nobody ever read them, few ever updated them, and they reinforced my hunch that a small percentage of comments are useful (in which case, by all means, use comments!)

einpoklum 10/24/2024|

I'm a library author and maintainer, in C++ rather than C#. I know what you mean about the redundancy, and my solution to that - which, of course, requires management approval in your case - is the following:

1. It is legitimate to skip parts of a doxygen/JSDoc comment (return value, parameter description) if it is _entirely_ trivial. But:

2. You must document the _why_. Well-written function signatures can tell you the "what", but they usually not tell you the "why".

3. You must make an effort to squeeze that non-triviality of a documented aspect. So, you mention corner-case behavior; you relate the use of a parameter to its use elsewhere; and of course - you describe "why"'s: Why that parameter is _not_ of another type, or why it can't be avoided.

I guarantee you, that if you follow rules (2.) and (3.), you will usually not get to apply (1.); and at the same time, your doxygen comments will stop being annoying boilerplate that you automatically gloss over, because you would be much more likely to find useful information there.

rerdavies 10/24/2024||

Could you provide an example of a function or argument type that needs a 'why' explanation. Not getting it.

skydhash 10/25/2024||

Not GP, but here is an example of one: https://git.sr.ht/~delthas/senpai/tree/master/item/ui/buffer...

It’s mostly done to mark contentious, complicated, or good enough implementations because after reading the code your first reflex would be: Why is it done that way?

Also personally I mark part of the code with short What? or How? comments for quicker scanning especially if the syntax is terse or it’s not immediately obvious. Kinda like a table of contents.

rerdavies 11/2/2024||

lol. Fabulous comment!

I entirely understand the use of why comments in the body of code.

I was more curious about why GP thought that functions and parameters should always have why annotations (and only have why annotations). Which seems a bit monstrous, unless I'm misunderstanding something.

gtirloni 10/24/2024||

Is writing a few comments here and there explaining why things are done in a certain way so terrible that we have to create this thing?

hinkley 10/24/2024|

Every time you write documentation, consider if it would be less time and energy to fix the problem you're describing instead of apologizing for it.

Philosophical differences between your code and a library you're using are a good example of when you explain rather than fix. But if it's just two chunks of your own code it might just be faster in the long run to make the problem go away instead of having to explain repeatedly why it's there at random intervals. People underestimate the cost of interruptions, and load up their future with self-imposed interruptions.

mannyv 10/23/2024||

Code only tells you 'what,' not 'why.' And 'why' is usually what matters.

Vinnl 10/23/2024|

https://github.com/reactjs/react.dev/issues/3896

(I do generally agree with your point.)

hinkley 10/24/2024||

I feel dumber for having read that.

mmastrac 10/23/2024||

I've been developing for a very long time and I'm neither on the side of "lots of comments" or "all code should speak for itself".

My philosophy is that comments should be used for two things: 1) to explain code that is not obvious at first glance, and 2) to explain the rationale or humanitarian reasons behind a bit of code that is understandable, but the reasons for its existence are unclear.

No philosophy is perfect, but I find that it strikes a good balance between maintainability of comment and code pairing and me being able to understand what a file does when I come back to it a year later.

The article is not good IMO. They have a perfect example of a function that could actually make use of further comments, or a refactoring to make this more self-documenting:

  function isPasswordValid(password) {
    const rules = [/[a-z]{1,}/, /[A-Z]{1,}/, /[0-9]{1,}/, /\W{1,}/];
    return password.length >= 8 && rules.every((rule) => rule.test(password));
  }

Uncommented regular expressions are a code smell. While these are simple, the code could be more empathetic to the reader by adding at least a basic comment:

  function isPasswordValid(password) {
    // At least one lowercase, one uppercase, one number and one symbol
    const rules = [/[a-z]{1,}/, /[A-Z]{1,}/, /[0-9]{1,}/, /\W{1,}/];
    return password.length >= 8 && rules.every((rule) => rule.test(password));
  }

Which would then identify the potentially problematic use of \W (ie: "[^a-zA-Z0-9]"). And even though I've been writing regular expressions for 20+ years, I still stumble a bit on character classes. I'm likely not the only one.

Now you can actually make this function self-documenting and a bit more maintainable with a tiny bit more work:

  // Returns either "true" or a string with the failing rule name.
  // This return value is kind of awkward.
  function isPasswordValid(password) {
    // Follow the password guidelines by WebSecuritySpec 2021
    const rules = [
      [MIN_LENGTH, /.{8,}/],
      [AT_LEAST_ONE_LOWERCASE, /[a-z]{1,}/],
      [AT_LEAST_ONE_UPPERCASE, /[A-Z]{1,}/],
      [AT_LEAST_ONE_NUMBER, /[0-9]{1,}/],
      // This will also allow spaces or other weird characters but we decided
      // that's an OK tradeoff.
      [AT_LEAST_ONE_SYMBOL, /\W{1,}/],
    ];

    for (const [ruleName, regex] of rules) {
      if (!regex.test(password)) {
        return ruleName;
      }
    }

    return true;
  }

You'd probably want to improve the return types of this function if you were actually using in production, but this function at least now has a clear mapping of "unclear code" to "english description" and notes for any bits that are possibly not clear, or are justifications for why this code might technically have some warts.

I'm not saying I'd write this code like this -- there's a lot of other ways to write it as well, with many just as good or better with different tradeoffs.

There are lots of ways to make code more readable, and it's more art than science. Types are a massive improvement and JSDoc is so understandably awkward to us.

Your goal when writing code shouldn't be to solve it in the cleverest way, but rather the clearest way. In some cases, a clever solution with a comment can be the clearest. In other cases, it's better to be verbose so that you or someone else can revisit the code in a year and make changes to it. Having the correct number of comments so that they add clarity to code without having too many that they become easily outdated or are redundant is part of this as well.

tpmoney 10/24/2024||

> Having the correct number of comments so that they add clarity to code without having too many that they become easily outdated or are redundant is part of this as well.

In fact, I would argue its not only part of it, it's an essential part. I've worked on projects that were written by folks who were very into the "the source code is the documentation" and comments were extremely minimal. Which was fine for the most part, they did a pretty good job at naming things. The problem was what happened when you were trying to debug a problem. The problems usually start looking like this:

"I see ${function} code does X, and that's causing Y which ultimately leads to bug Z, is ${function} supposed to do X, and we should be checking the condition at another level where Y happens or should ${function} not be doing X and that's the root bug?". And no one knows the answer because there's no documentation about what the function is SUPPOSED to do. I view actual comments and documentation about code as the "parity bit" equivalent for CRC checks / RAID 5 storage. If the code and the documentation agree, great now we can have a discussion about whether the code needs to change. If the code and the documentation disagree, then the code is wrong just like if I get data and a parity that don't match, the data is wrong. Your example of the "AT_LEAST_ONE_SYMBOL" with the associated comment is a perfect example of why this is critical. If some months down the line spaces in the password is causing a problem, I now know the code is supposed to be allowing spaces and we know we need to fix the part that's choking on spaces, not the validation rule.

Yes it's possible the documentation/parity is wrong, but in either case it calls out a large warning that we're out of sync at this point, and need to get in sync before we can continue.

hinkley 10/24/2024|||

When I put a hack in place as a workaround for a bug in a dependency, I always link the ticket for the dependency in the PR and the commit comment. If the code looks like someone held a gun to my head or I was on drugs when I wrote it, I'll also put it inline in a comment above the workaround.

Not only to explain why this stupid code is here, but to give the next person permission to delete it if the bug has been fixed upstream and we have pulled the new version down.

LorenPechtel 10/24/2024||

Argh! While I agree 100% with your thoughts here I find how you wrote the list of rules to be very hard to read. And I don't like the comment in the middle of the rules, move it above the declaration.

Regex is about the opposite of self documenting as it comes, anything non-trivial needs an explanation!

mmastrac 10/24/2024||

Well yeah, that's why I said it was "art" :). It's very subjective what's readable and what isn't, and putting code on HN without syntax highlighting (and with the ironic difficulty of editing code here) is one of hardest ways to communicate.

Mikhail_Edoshin 10/25/2024||

Names do have some importance. If you pick random words and assign them to things you deal with you will find yourself unable to reason about them. Try it, it is interesting. Yet names are not the pinnacle of design. Far from it.

Look at a mechanical watch. (For example, here: https://ciechanow.ski/mechanical-watch/). Those little details, can you come up with self-documenting names for them? I do not think so. In programming good design is very much like that watch: it has lots of strangely-looking things that are of that shape because it fits their purpose [1]. There is no way to give them some presumably short labels that explain that purpose out of the context. Yet we need to point to them as we talk about them [2]. The role of names in programming is thus much more modest. In the order of importance:

- They must be distinct within the context (of course). - Yet their form must indicate the similarities between them: alen and blen are of the same kind and are distinct from abuf and bbuf, which are also of the same kind. - They must be pronounceable and reasonably short. Ideally they should be of the same length. - They need to have some semblance to the thing they represent. - It would be nice to make them consistent across different contexts. Yet this is is incredibly tedious task of exponential complexity.

There is also the overall notation. Ideally it should resemble written reasoning that follows some formal structure. None of existing notations is like that. The expressive tools in these notations are not meant to specify reasoning: they are meant to specify the work of a real or virtual machine of some kind. The fallacy of self-documenting code is an unrecognized desire to somehow reason with the knobs of that machine. It will not work this way. Yet a two-step process would work just fine: first you reason, then you implement this on the machine. But it will not look self-documenting, of course. P. S. This is a major problem in programming: we keep the code, but do not keep the reasoning that led to it.

[1] fitness for the purpose, Christopher Alexander, “The timeless way of building”. [2] notion vs definition, Evald Ilyenkov.

virgilp 10/23/2024||

missed the opportunity to create named constants for each of the password validation rules.

sesteel 10/25/2024||

Looking at this thread, it is a wonder that any PRs make it through review. I started calling these kinds of debates Holographic Problems.

- Spaces vs Tabs

- Self documenting code vs documented code

- Error Codes vs Exceptions

- Monolithic vs Microservices Architectures

- etc.

Context matters and your context should probably drive your decisions, not your personal ideology. In other words, be the real kind of agile; stay flexible and change what needs to be changed as newly found information dictates.

Aeolun 10/24/2024|

> Short-circuit evaluation allows us to simplify conditional statements by using logical operators.

Simple is not the same thing as understandable.

They lost me entirely here.

More comments...