Posted by tie-in 4 days ago
// Creates a user and returns the newly created user's id on success
Hmm, it returns an id? But the @returns is Promise<any>? The code as written will change when userService.create changes... without the actual, human readable bit of prose, that potential code issue could be easily overlooked.
Of course, here the code could have a newtype for UserId and return Promise<UserId>, making the code better and then the prose is basically not needed (but please just write a docstring).
FWIW I would document that the `user` parameter is modified. And document the potential race condition between checking the existence of a user and creating a user, and maybe why it was chosen to be done in this order (kinda flimsy in this example). Which would probably lead me to designing around these issues.
Trying to only document via self-documenting code seems to always omit nuances.
/** Create a user and return the id, or throw an error with an appropriate code.
*
* user.password may be changed after this function is called.
*/
async function createUser(user: User): Promise<number> {
if (!validateUserInput(user)) {
throw new Error(err.userValidationFailed);
}
if (isPasswordValid(user.password)) {
// Check now if the user exists, so we can throw an error before hashing the password.
// Note: if a user is created in the short time between this check and the actual creation,
// there could be an unfriendly error
const userExists = !!(await userService.getUserByEmail(user.email));
if (userExists) {
throw new Error(err.userExists);
}
} else {
throw new Error(err.invalidPassword);
}
user.password = await hashPassword(user.password);
return userService.create(user);
}
`isValid() || throwError()` is an abuse of abstraction
Much of my code, for example, is fail fast, and then I have error handling, supervisors, etc, at a level that can log and restart the work.
I'm a little disappointed no one mentioned this even as a side comment, since there's a whole section on doing short-circuit evaluation, even if it's not in a way that would cause this kind of problem.
1. It is legitimate to skip parts of a doxygen/JSDoc comment (return value, parameter description) if it is _entirely_ trivial. But:
2. You must document the _why_. Well-written function signatures can tell you the "what", but they usually not tell you the "why".
3. You must make an effort to squeeze that non-triviality of a documented aspect. So, you mention corner-case behavior; you relate the use of a parameter to its use elsewhere; and of course - you describe "why"'s: Why that parameter is _not_ of another type, or why it can't be avoided.
I guarantee you, that if you follow rules (2.) and (3.), you will usually not get to apply (1.); and at the same time, your doxygen comments will stop being annoying boilerplate that you automatically gloss over, because you would be much more likely to find useful information there.
It’s mostly done to mark contentious, complicated, or good enough implementations because after reading the code your first reflex would be: Why is it done that way?
Also personally I mark part of the code with short What? or How? comments for quicker scanning especially if the syntax is terse or it’s not immediately obvious. Kinda like a table of contents.
Philosophical differences between your code and a library you're using are a good example of when you explain rather than fix. But if it's just two chunks of your own code it might just be faster in the long run to make the problem go away instead of having to explain repeatedly why it's there at random intervals. People underestimate the cost of interruptions, and load up their future with self-imposed interruptions.
(I do generally agree with your point.)
My philosophy is that comments should be used for two things: 1) to explain code that is not obvious at first glance, and 2) to explain the rationale or humanitarian reasons behind a bit of code that is understandable, but the reasons for its existence are unclear.
No philosophy is perfect, but I find that it strikes a good balance between maintainability of comment and code pairing and me being able to understand what a file does when I come back to it a year later.
The article is not good IMO. They have a perfect example of a function that could actually make use of further comments, or a refactoring to make this more self-documenting:
function isPasswordValid(password) {
const rules = [/[a-z]{1,}/, /[A-Z]{1,}/, /[0-9]{1,}/, /\W{1,}/];
return password.length >= 8 && rules.every((rule) => rule.test(password));
}
Uncommented regular expressions are a code smell. While these are simple, the code could be more empathetic to the reader by adding at least a basic comment: function isPasswordValid(password) {
// At least one lowercase, one uppercase, one number and one symbol
const rules = [/[a-z]{1,}/, /[A-Z]{1,}/, /[0-9]{1,}/, /\W{1,}/];
return password.length >= 8 && rules.every((rule) => rule.test(password));
}
Which would then identify the potentially problematic use of \W (ie: "[^a-zA-Z0-9]"). And even though I've been writing regular expressions for 20+ years, I still stumble a bit on character classes. I'm likely not the only one.Now you can actually make this function self-documenting and a bit more maintainable with a tiny bit more work:
// Returns either "true" or a string with the failing rule name.
// This return value is kind of awkward.
function isPasswordValid(password) {
// Follow the password guidelines by WebSecuritySpec 2021
const rules = [
[MIN_LENGTH, /.{8,}/],
[AT_LEAST_ONE_LOWERCASE, /[a-z]{1,}/],
[AT_LEAST_ONE_UPPERCASE, /[A-Z]{1,}/],
[AT_LEAST_ONE_NUMBER, /[0-9]{1,}/],
// This will also allow spaces or other weird characters but we decided
// that's an OK tradeoff.
[AT_LEAST_ONE_SYMBOL, /\W{1,}/],
];
for (const [ruleName, regex] of rules) {
if (!regex.test(password)) {
return ruleName;
}
}
return true;
}
You'd probably want to improve the return types of this function if you were actually using in production, but this function at least now has a clear mapping of "unclear code" to "english description" and notes for any bits that are possibly not clear, or are justifications for why this code might technically have some warts.I'm not saying I'd write this code like this -- there's a lot of other ways to write it as well, with many just as good or better with different tradeoffs.
There are lots of ways to make code more readable, and it's more art than science. Types are a massive improvement and JSDoc is so understandably awkward to us.
Your goal when writing code shouldn't be to solve it in the cleverest way, but rather the clearest way. In some cases, a clever solution with a comment can be the clearest. In other cases, it's better to be verbose so that you or someone else can revisit the code in a year and make changes to it. Having the correct number of comments so that they add clarity to code without having too many that they become easily outdated or are redundant is part of this as well.
In fact, I would argue its not only part of it, it's an essential part. I've worked on projects that were written by folks who were very into the "the source code is the documentation" and comments were extremely minimal. Which was fine for the most part, they did a pretty good job at naming things. The problem was what happened when you were trying to debug a problem. The problems usually start looking like this:
"I see ${function} code does X, and that's causing Y which ultimately leads to bug Z, is ${function} supposed to do X, and we should be checking the condition at another level where Y happens or should ${function} not be doing X and that's the root bug?". And no one knows the answer because there's no documentation about what the function is SUPPOSED to do. I view actual comments and documentation about code as the "parity bit" equivalent for CRC checks / RAID 5 storage. If the code and the documentation agree, great now we can have a discussion about whether the code needs to change. If the code and the documentation disagree, then the code is wrong just like if I get data and a parity that don't match, the data is wrong. Your example of the "AT_LEAST_ONE_SYMBOL" with the associated comment is a perfect example of why this is critical. If some months down the line spaces in the password is causing a problem, I now know the code is supposed to be allowing spaces and we know we need to fix the part that's choking on spaces, not the validation rule.
Yes it's possible the documentation/parity is wrong, but in either case it calls out a large warning that we're out of sync at this point, and need to get in sync before we can continue.
Not only to explain why this stupid code is here, but to give the next person permission to delete it if the bug has been fixed upstream and we have pulled the new version down.
Regex is about the opposite of self documenting as it comes, anything non-trivial needs an explanation!
Look at a mechanical watch. (For example, here: https://ciechanow.ski/mechanical-watch/). Those little details, can you come up with self-documenting names for them? I do not think so. In programming good design is very much like that watch: it has lots of strangely-looking things that are of that shape because it fits their purpose [1]. There is no way to give them some presumably short labels that explain that purpose out of the context. Yet we need to point to them as we talk about them [2]. The role of names in programming is thus much more modest. In the order of importance:
- They must be distinct within the context (of course). - Yet their form must indicate the similarities between them: alen and blen are of the same kind and are distinct from abuf and bbuf, which are also of the same kind. - They must be pronounceable and reasonably short. Ideally they should be of the same length. - They need to have some semblance to the thing they represent. - It would be nice to make them consistent across different contexts. Yet this is is incredibly tedious task of exponential complexity.
There is also the overall notation. Ideally it should resemble written reasoning that follows some formal structure. None of existing notations is like that. The expressive tools in these notations are not meant to specify reasoning: they are meant to specify the work of a real or virtual machine of some kind. The fallacy of self-documenting code is an unrecognized desire to somehow reason with the knobs of that machine. It will not work this way. Yet a two-step process would work just fine: first you reason, then you implement this on the machine. But it will not look self-documenting, of course. P. S. This is a major problem in programming: we keep the code, but do not keep the reasoning that led to it.
[1] fitness for the purpose, Christopher Alexander, “The timeless way of building”. [2] notion vs definition, Evald Ilyenkov.
- Spaces vs Tabs
- Self documenting code vs documented code
- Error Codes vs Exceptions
- Monolithic vs Microservices Architectures
- etc.
Context matters and your context should probably drive your decisions, not your personal ideology. In other words, be the real kind of agile; stay flexible and change what needs to be changed as newly found information dictates.
But he keeps the cryptic error codes that will go into the logs, or in the frontend where the developer will have to look up the error code. Don't map an error name to u105, just return the actual string: "userValidationFailed".