Git commands I run before reading any code

Posted by grepsedawk 10 hours ago

Git commands I run before reading any code(piechowski.io)

1429 points | 321 comments

pzmarzly 9 hours ago|

Jujutsu equivalents, if anyone is curious:

What Changes the Most

    jj log --no-graph -r 'ancestors(trunk()) & committer_date(after:"1 year ago")' \
      -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
      | sort | uniq -c | sort -nr | head -20

Who Built This

    jj log --no-graph -r 'ancestors(trunk()) & ~merges()' \
      -T 'self.author().name() ++ "\n"' \
      | sort | uniq -c | sort -nr

Where Do Bugs Cluster

    jj log --no-graph -r 'ancestors(trunk()) & description(regex:"(?i)fix|bug|broken")' \
      -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
      | sort | uniq -c | sort -nr | head -20

Is This Project Accelerating or Dying

    jj log --no-graph -r 'ancestors(trunk())' \
      -T 'self.committer().timestamp().format("%Y-%m") ++ "\n"' \
      | sort | uniq -c

How Often Is the Team Firefighting

    jj log --no-graph \
      -r 'ancestors(trunk()) & committer_date(after:"1 year ago") & description(regex:"(?i)revert|hotfix|emergency|rollback")'

Much more verbose, closer to programming than shell scripting. But less flags to remember.

palata 9 hours ago||

To me, it makes jujutsu look like the Nix of VCSes.

Not meaning to offend anyone: Nix is cool, but adds complexity. And as a disclaimer: I used jujutsu for a few months and went back to git. Mostly because git is wired in my fingers, and git is everywhere. Those examples of what jujutsu can do and not git sound nice, but in those few months I never remotely had a need for them, so it felt overkill for me.

rjh29 1 hour ago|||

It's the dvorak of git... Maybe more efficient but incompatible with everyone else and a very loud vocal minority.

You can find this pattern again and again. How many redditors say 120fps is essential for gaming or absolutely require a mechanical keyboard?

palata 45 minutes ago|||

Yeah I think that dvorak is a good example, too.

I don't get the mechanical keyboard one, though. I am fine with any keyboard, I just like my mechanical keyboard at home. Just like I am fine with any chair, but ideally I would have a chair I like at home.

120fps I have no experience with, but I would imagine it's closer to video quality. Once you're used to watching everything in 4K, probably it feels frustrating to watch a 1080p video. But when 4K did not exist, it was not a need. I actively try to not get used to 4K because I don't want to "create the need" for it :-).

RankingMember 1 hour ago||||

I don't even like using "natural" keyboards despite the ergonomic advantage because it ruins my muscle memory when I'm on the (much more prevalent) "regular" keyboard.

SatvikBeri 1 hour ago||||

It's totally compatible though, and that's a big selling point. I use jj and nobody else at my work uses it and that has never been an issue.

palata 49 minutes ago|||

I think the "incompatible" was more in the dvorak sense, which I believe is that whenever you are on another computer, it most likely won't have dvorak.

For jujutsu, it's fine on your own computer, but you probably have to use git in the CI or on remote servers. And you probably started with git, so moving to jujutsu was an added effort (similar to dvorak).

FrankBooth 41 minutes ago|||

It doesn’t support submodules. So no, not totally compatible.

deno 51 minutes ago|||

I mean let's not be hasty. Mechanical keyboards used to be just normal keyboards when computers were still computers.

Jenk 8 hours ago||||

Tbf you wouldn't use/switch to jj for (because of) those kind of commands, and are quite the outlier in the grand list of reasons to use jj. However the option to use the revset language in that manner is a high-ranking reason to use jj in my opinion.

The most frequent "complex" command I use is to find commits in my name that are unsigned, and then sign them (this is owing to my workflow with agents that commit on my behalf but I'm not going to give agents my private key!)

    jj log -r 'mine() & ~signed()'

    # or if yolo mode...

    jj sign -r 'mine() & ~signed()'

I hadn't even spared a moment to consider the git equivalent but I would humbly expect it to be quite obtuse.

palata 8 hours ago||

Actually, signing was one of the annoying parts of jujutsu for me: I sign with a security key, and the way jujutsu handled signing was very painful to me (I know it can be configured and I tried a few different ways, but it felt inherent to how jujutsu handles commits (revisions?)).

arccy 7 hours ago||

The only reasonable way to use signing in jj is with the sign-on-push config https://docs.jj-vcs.dev/latest/config/#automatically-signing... rather than as commits are made

Zambyte 7 hours ago||

Why? I have my signing behavior set to own and I haven't noticed any issues, but I don't actually rely on signatures for much.

singron 4 hours ago||

If you need to type in a password to unlock your keychain (e.g. default behavior for gpg-agent), then signing commits one at a time constantly is annoying.

Does "own" try to sign working copy snapshot commits too? That would greatly increase the number and frequency of signatures.

Zambyte 3 hours ago||

Ah, I use my SSH key to sign my commits and I don't have a password on my SSH key.

> Does "own" try to sign working copy snapshot commits too?

Yes

mamcx 2 hours ago|||

No, jj is super simple in daily use, in contrast with git that is a constant chore (and any sane person use alias). This include stuff that in git is a total mess of complexity like dealing with rebases. So not judge the tool for this odd case.

palata 42 minutes ago|||

> in contrast with git that is a constant chore (and any sane person use alias)

I don't use aliases, I guess I'm insane?

Also 99.9% of the time, git "just works" for me. If I need to do something special once a year, I can search for it. Like I would with jujutsu.

Exoristos 2 hours ago|||

One rarely needs more from git than `git add -A && git commit -m`.

dcre 30 minutes ago|||

Saw all the replies crying over how verbose these are, clicked through to TFA expecting to see simpler commands. Nope, they're basically the same thing, just slightly shorter. I would never memorize either the jj or git versions if I planned to use them regularly; I'd make aliases.

stingraycharles 7 hours ago|||

I don’t understand how people can remember all these custom scripting languages. I can’t even remember most git flags, I’m ecstatic when I remember how to iterate over arrays in “jq”, I can’t fathom how people remember these types of syntaxes.

crispyambulance 6 hours ago|||

I am convinced that the vast majority of professionals simply don't bother to remember and, ESPECIALLY WITH GIT, just look stuff up every single time the workflow deviates from their daily usage.

At this point perhaps a million person-years have been sacrificed to the semantically incoherent shit UX of git. I have loathed git from the beginning but there's effectively no other choice.

That said, the OP's commands are useful, I am copying them (because obviously I won't ever memorize them).

freedomben 5 hours ago|||

> I am convinced that the vast majority of professionals simply don't bother to remember and, ESPECIALLY WITH GIT, just look stuff up every single time the workflow deviates from their daily usage.

I wrote a cheat sheet in my notes of common commands, until they stuck in my head and I haven't needed it now for a decade or more. I also lean heavily on aliases and "self-documenting" things in my .bashrc file. Curious how others handle it. A search every time I need to do something would be too much friction for me to stand.

bluGill 1 hour ago|||

I refuse to have alises and other custom commands. Either it is useful for everyone and so I make a change to the upstream project (I have never done this), or it won't exist next time I change my system so there is no point. I do have some custom tools that I am working on that haven't been released yet, but the long term goal is either delete them or release them to more people who will use them so I know it will be there next time I use a different system.

dheera 3 hours ago|||

I just use Claude Code as a terminal for git these days. It writes up better commit messages than I would write anyway. No more "git commit -m fix"

freedomben 2 hours ago||

indeed, I held off for a while but finally caved because I got sick of seeing commits with `git commit -m .` littered in there. These are personal projects so I'm the only one dev-ing on them, but still so nice to have commit messages.

alwillis 2 hours ago||||

> At this point perhaps a million person-years have been sacrificed to the semantically incoherent shit UX of git. I have loathed git from the beginning but there's effectively no other choice.

Yes! We mostly wouldn’t tolerate the complexity and the terrible UX of a tool we use everyday--but there's enough Stockholm Syndrome out there where most of us are willing to tolerate it.

TeMPOraL 25 minutes ago||

Unless you're aware that such powerful commands are something you need once in a blue moon, and then you're grateful that the tool is flexible enough to allow them in the first place.

Git may be sharp and unwieldy, but it's also one of the decreasing amount of tools we still use - the trend of turning tools into toys consumed the regular user market and is eating into tech software as well.

dec0dedab0de 1 hour ago||||

I just use my ide integrations for git. I absolutely love the way pycharm/jetbrains does it, and I'm starting to be ok with how vscode does. Remembering git commands besides the basics is just pointless. If I need to do something that the gui doesn't handle, I'll look it up and put it in a script.

weedhopper 5 hours ago||||

I’ve recently been looking into some tools that provide quick or painless help like pop up snippets with descriptions and cheat sheets, got any recommendations?

arcanemachiner 4 hours ago|||

Navi is good for generating personal cheatsheets:

https://github.com/denisidoro/navi

But for Git, I can't recommend lazygit enough. It's an incredible piece of software:

https://github.com/jesseduffield/lazygit

alwillis 2 hours ago||||

Cheaters: https://brettterpstra.com/projects/cheaters

rusted_gear 5 hours ago|||

I've found tldr to be useful

https://github.com/tldr-pages/tldr

awesome_dude 44 minutes ago|||

The relevant XKCD comic https://xkcd.com/1597/

FWIW I too was once a "memorised a few commands and that was it" type of dev, then I read 3 chapters of the Git book https://git-scm.com/book/en/v2 (well really two, the first chapter was a "these are things you already know") and wow did my life with git change.

Cthulhu_ 7 hours ago||||

I don't, I will google things and fiddle, then put it in a git alias (with a comment on what it does and / or where I got it from) and push it to my private dotfiles repo, taking it with me between computers and projects.

dzaima 3 hours ago||||

jj's template and revset languages are very simple syntactically, so once you're comfortable with the few things you do use often it's just a question of learning about the other existing functions (even if only enough to know to look them up), which slot right in and compose well with everything else you know (unlike flags which typically have each their own system).

Or, perhaps better yet, defining your own functions/helpers as you go for things you might care about, which, by virtue of having been named you, are much easier to remember (and still compose nicely).

dewey 5 hours ago||||

You research it once, use it and then remember that it has "ancestor" in the command somewhere and then use ctrl + R to dig up something from your shell history.

TeMPOraL 22 minutes ago||||

People naturally remember what they use frequently. For things they use infrequently, they search on-line and/or read the friendly manual.

And yes, I'm also ecstatic when I manage to iterate over anything in `jq` without giving up and reaching for online reference. For `git`, functionality I use divides neatly into "things I do at least every week or two" and "things that make me reach for the git book every single time".

I mean, that was true until ~year or so. Now, I just have an LLM on speed dial. `howto do xyz in $tool`, `wtf is git --blah`, `oneliner for frobbing the widget`, etc.

limaoscarjuliet 2 hours ago||||

Some things are idioms that one repeats so often they just stick, e.g. I use "grep.... | cut -c x-y | sort | uniq -c | sort -nr" to quickly grep frequency of some events from a log file.

Don't feel bad - no one remembers them all, we just remember a few idioms we use...

usrbinbash 4 hours ago||||

> I don’t understand how people can remember all these custom scripting languages.

We can't.

Why do you think the `man` command exists?

mgfist 7 hours ago||||

Same, but now with AI I don't have to remember that anymore

awesome_dude 41 minutes ago||

For now - the law of enshittification means that the free/cheap access to AI will be curtailed soon enough.

mgfist 31 minutes ago||

Pretty much any OS locally runnable LLM can generate this stuff.

NoSalt 1 hour ago||||

So, how does one iterate over an array in jq? Asking for a friend.

drob518 3 hours ago||||

Nobody does. One person figures it out, then writes a blog post, and we all Google for it. Even git’s man pages are long and sometimes cryptic.

robrain 4 hours ago||||

If I look something up twice, I record it in Obsidian. If I need it more than a couple of times, I'll probably make an alias, a script or a mask [1] file. Autocomplete and autosuggest are essential to my workflow. And good history search.

[1] https://github.com/jacobdeichert/mask

gspr 40 minutes ago||||

You and me both. Git is just so prevalent and fundamental to so much these days that I forced myself to use only a cheat sheet lying on my desk until I could comfortably use a reasonably productive subset by memory. Little did I know that that would make my colleagues think I'm some sort of git sage.

But jq I use maybe once a week, and it just won't stick. Same for any git features beyond basic wrangling of the history tree (but, on the flip side, that basic wrangling has eliminated 99% of the times I have to look things up).

SoftTalker 4 hours ago||||

Yeah especially with git. All I know is pull, add, commit, push. Everything else I have to look up.

Fokamul 3 hours ago||||

You add them into your GIT config file as shortcuts?

If you have multiple machines (/must have), just apply your user config to current machine?

TheRealPomax 4 hours ago|||

If you don't have to codedive new projects all the time, there's zero reason to memorize these. If your job is to look at new codebases all the time, you probably learn to remember these commands pretty quickly.

socalgal2 4 hours ago|||

a project isn’t dying because of no commits. Rather it’s stable

I often feel I need to setup bots to make superfluous commits just to make it look like my useful and stable repos are “active”

One example (not mine) a a qr-code generator library. Hasn’t been updated in 10 years. It’s perfect as is. It just provides the size and the bits. You convert those bits to any representation you want. It has no need to be updated

fishpen0 1 hour ago|||

In a real company? A private codebase at a minimum should still be getting regular security patching and dependency updates. Always eventually one of those updates requires some level of refactor. If I see a project with no commits, I run away.

wredcoll 4 hours ago||||

It's rare, I think, for a project to have such a well defined and singular purpose that has not changed in 10 years nor have any bugs been discovered or its dependencies changed underneath it.

It's not impossible, of course, but if I saw even a qr library that hadn't changed in 10 years I would worry that it wouldn't build on current systems (due to dependencies) and that nobody was actually using it (due to lag of bug reports).

latexr 4 hours ago|||

I have several of those projects. I avoid dependencies as much as possible, striving to only use things which I know ship with my target OS. I code for a level of correctness and longevity. That benefits everyone, including myself.

A QR (or barcode) library is exactly the type of thing I’d assume would still work fine, since there’s nothing new to do, the parsing rules don’t change, it’s a static, known, solved problem.

robinsonb5 2 hours ago||

> A QR (or barcode) library is exactly the type of thing I’d assume would still work fine, since there’s nothing new to do, the parsing rules don’t change, it’s a static, known, solved problem.

I agree with you - and yet the barcode library I used recently for a variable-data-printing project was last updated 13 hours ago, despite having been around since 2008!

xp84 3 hours ago|||

Well said. Even an awesome library with no bugs that has no external dependencies still depends on the stdlib. For a while, before we were using containers, we even had the issue on Mac dev machines especially, where a half dozen Rubygems would crash while building its C extensions if your Mac OS version wasn’t just what the author expected, due to changes in the compiler shipped by Apple. So a MacOS major update might on its own functionally break a gem, even if the gem itself was designed well and you were using the same Ruby version.

saila 3 hours ago||||

This might be true for libraries or utilities that have a well-defined scope and no dependencies, but that's not what the article is focused on. When considering a company's main product, it's usually never done and patterns of activity—and especially changes in those patterns—can give you insight into potential issues.

latexr 4 hours ago|||

> a project isn’t dying because of no commits. Rather it’s stable

Agreed. Assuming there are no open issues and PRs. When I find a project, if the date of the last commit is old, I next look at the issues and PRs. If there are simple-to-deal-with issues (e.g. a short question or spam) and easy-to-merge PRs (e.g. fixing a typo in the README) which have been left lingering for years, it’s probably abandoned. Looking at the maintainer’s GitHub activity graph should provide more clues.

> I often feel I need to setup bots to make superfluous commits just to make it look like my useful and stable repos are “active”

I have never done it, but a few times thought about making a “maintenance” release to bump the version number and release date, especially since I often use a variant of calendar versioning.

cynicalsecurity 7 minutes ago|||

Not interested, thank you.

newsoftheday 3 hours ago|||

I don't want to program git, I want to get stuff done so I would reject using that tool and do what the article author did running tried and true pipeable Linux/UNIX commands. It's also the same reason why I dislike Gradle and use Maven, I don't want to program my build I want to define and run my build.

nine_k 3 hours ago||

But the git commands in the article is also programming of the same kind, just using more terse, more obscure language. All the shell pipelines are sort, uniq, and grep.

A language that properly maps to the data model, and has readable identifiers is a boon. Git is a database, a database needs a proper query language.

gib444 7 hours ago|||

Hah someone really looked at jq (?) and thought: "yes, more of this everywhere". I feel jq is like marmite (edit: aka vegemite, i.e. "you either love it or you hate it")

plandis 3 hours ago|||

It doesn't seem any more egregious than something like:

`git log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --`

Which is something I see a lot of people alias in Git for viewing logs.

maleldil 5 hours ago|||

It's really not that bad, although the jq comparison might be apt. You have such primitives you need to understand, and then everything just fits together nicely. I find this much easier to write and understand than git's cryptic format strings.

Disclaimer: I love jq too :)

faangguyindia 8 hours ago|||

I can't remember all of this, does anyone know of any LLM model trained on CLI which can be run locally?

lamasery 8 hours ago|||

If you copy those commands into a file and use that file to prompt the “sh” LLM.

stingraycharles 7 hours ago||

That works until you need a small variation of any of these commands and you’re lost.

fainpul 6 hours ago||||

Try https://cheat.sh/

esafak 6 hours ago|||

Not a model, but a product: warp.dev

huflungdung 7 hours ago||

[dead]

bsuvc 7 hours ago||

I love how the author thinks developers write commit messages.

All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now".

It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful.

AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).

ramijames 6 hours ago||

This is a team lead/CTO problem. A good leader will be explicit in their expectations that developers write good commit messages. I've certainly had good leaders that expect this.

ryandrake 5 hours ago|||

Yes, and a culture problem, too. I guess I've been blessed that I've mostly only worked for "grown up" companies, but I've never encountered a workplace where people didn't write useful commit messages. At least one line description of the work done, but often multiple lines of valuable context. Only the junior devs had to be told to do it, but once they got into the habit, everyone understood why we do it and it was no big deal.

If I joined a company where people committed their code with "stuff" or "made some changes" or "asdfhlfo;ejfo;ae," that would be a red flag that I might have joined the wrong company, and I'd start to wonder what else the developers here do carelessly.

godelski 1 hour ago|||

The same goes for code comments though people are much more vocal about their disdain. It's ironic given how frequent AI is used to generate docs. But docs are much better written by the person who wrote the code, the person who has all the context.

These things never take much time but people dismiss them because of that. Because each commit and each comment in isolation isn't very valuable but they are very helpful in aggregate. I'm not sure why this bias exists though, since the same is true for lines of code. It's also true about a ton of things. All the little things add up. Just because it's little now doesn't mean it's not important

ramijames 3 hours ago||||

Indeed. If you can't spend two minutes (MAX) writing a sentence or two explaining what the commit is for, then what are we doing as developers? Commits are for future you and your future team. They are a tool. Please, use them.

lossyalgo 3 hours ago||||

Good commit messages would be nice but honestly I would be over the moon if our pull requests would be approved within a week without having to ping one or more people.

scottyah 3 hours ago|||

Some of my favorite are the perhaps well-meaning but totally misguided log of what files they changed.

berkes 3 hours ago||||

I once tutored an intern. Who thought he was The Best Programmer On Earth (didn't we all at that age?). He refused to use revision control, it slowed him down.

So we told him to commit at least once every day, with a relevant commit message, or else fail his internship.

He worked 21 more days. There were 21 commits: "17:00, time to go home".

vntok 3 hours ago||

This reads like the intern was left to his own devices and his output not checked at all for three weeks straight. Actual tutoring would have surfaced the issue after 1 or 2 days tops.

PUSH_AX 5 hours ago||||

I think it's a stretch to measure leadership quality on something so minor, a lot of teams find them pretty useless no matter how good they are.

munksbeer 4 hours ago|||

I don't agree. These things actually matter. A developer who isn't told otherwise is just going to do whatever they feel like, so if there is nothing or no-one enforcing the standards, then the failure isn't on the individual developer, it is on the team lead. Someone needs to be setting the standards.

In the company I work for, there is a team that has isolated itself to some extent from other teams and works at a furious pace to keep their particular section of the business happy. We're lucky enough that they spun up their own repo to do their work on, so they don't actually impact other teams, but if the quality of the commit messages is anything to go by, I am 100% certain they're going to end up in a huge mess, if they aren't already. The team lead encourages this, and certainly doesn't care about commit messages etc.

Developers who care about other developers tend to write better quality code, because they care what other developers think of them. If you care about other developers, you will most likely write decent quality commit messages too.

I have seen over many years the types of developers who only care about moving their own code into production as fast as possible and getting to the next thing. There is a very high correlation with a mess at the end, which they inevitably won't have to tidy up because they'll be doing the next thing. These types of developers hate owning stuff in production, so they don't do it, so they don't actually care how maintainable their "clever" code is. I am very certain that a number of people reading this will be those types of developers.

xp84 3 hours ago||||

Useless? So you never use “git annotate” or your IDE to see who wrote a line of code whose purpose puzzles you, and go to the commit message to see what they were trying to accomplish? This is invaluable to me as long as commit messages are clear.

As a manager, one of the first things I do is make sure that the PR titles (the PR text becomes the commit messages in squash-merging workflows) at minimum begin with a ticket number. Then we can later read both the intention and the commentary on it.

PUSH_AX 3 hours ago||

> Useless? So you never use “git annotate” or your IDE to see who wrote a line of code whose purpose puzzles you, and go to the commit message to see what they were trying to accomplish?

Personally no, the code is the "truth". If I need more I'm going to open a dialog with the author, not spend time trying to interpret a 7 word commit message, "good" or otherwise.

macintux 3 hours ago|||

The code can only convey what is being done (and then, in some cases, only superficially). It can't convey what decisions were made, what alternatives were discarded, what business motivations may have led to that code.

And for old enough code, the author may not be available, or more likely doesn't remember.

PUSH_AX 3 hours ago||

Fine, but none of that is in a normal commit message, lets be real...

mgfist 1 hour ago||

Which circles back to why it's important for leadership to tackle this

PUSH_AX 1 hour ago||

Yes, but not in the form of commit messages, the parent comment described things better suited to jira tickets, documentation etc.

It feels like we're trying really hard to stretch the utility of commit messages here...

macintux 1 hour ago||

Mainly I was pushing back on: the code is the "truth"

I don't feel that is an accurate statement for any complex system.

PUSH_AX 1 hour ago||

I don't like complex systems, and I work hard not to create them.

mgfist 1 hour ago||

Sure but code can't capture everything. Maybe with enough comments I guess, but not code alone. For example, code won't tell you that this feature was timeboxed hence this edgecase was not supported

wreath 2 hours ago|||

And what of the original author is not there anymore?

PUSH_AX 2 hours ago||

The world will not end. I’ll get there.

two_tasty 5 hours ago||||

I partially disagree. Technical leadership at the micro/mid level should be able to set and enforce standards like "you must have semi-meaningful or meaningful commit messages." If and only if they set those standards, and the team does not follow them, then we can say that either the leadership is lacking, or there is a structural barrier/disincentive to following the rules. Within that framework, I do think using process-smells like this is valid for judging technical leadership.

To the point of other commenters however, I wouldn't lay something this micro at the foot of the CTO in all but the smallest of organizations.

freedomben 5 hours ago|||

It's a stretch to lay at the CTOs feet, but not the team lead or even Head/VP of Engineering IMHO. It's also easy to "enforce" if you're already doing peer review (which you definitely should be, even if not required for compliance).

drums8787 3 hours ago||||

I particularly love when the “CTO” is also the main offender.

ElijahLynn 1 hour ago|||

And in a squash and merge workflow, which are most teams I've been on the past 8 years, it really is the title of the pull request or merge request. That is what really matters.

And I really like that because it leaves room to let the developer do whatever kind of commit messages they want to that makes sense to them. Because nobody's really ever going to read those again after it squashed and merged.

tkzed49 38 minutes ago|||

Every time I hear about commit messages on HN, this is my first thought. I can't imagine not working in a squash workflow. No matter how good your commit messages are, I do not want to read all of them. The squashed commit will direct me to the original PR in case I need more detail.

skinner927 51 minutes ago|||

- fix - fix fr - fix frfr - plz - omg why - never gonna give you up - never gonna let you down - add missing curly braces

mikepurvis 6 hours ago|||

In codebases where PRs are squashed on merge, the commit messages on the main branch end up being the PR body description text, and that's actually reviewed so tends to be much better I find.

bob1029 6 hours ago|||

And in every codebase I've been in charge of, each PR has one or more issue # linked which describe every possible antagonizing detail behind that work.

I understand this isn't inline with traditional git scm, but it's a very powerful workflow if you are OK with some hybridization.

bikelang 3 hours ago||

I personally find this to be a substantially better pattern. That squashed commit also becomes the entire changeset - so from a code archeology perspective it becomes much easier to understand what and why. Especially if you have a team culture that values specific PRs that don’t include unrelated changes. I also find it thoroughly helpful to be able to see the PR discussions since the link is in the commit message.

mikepurvis 3 hours ago||

I agree, much as it's a loss for git as a distributed system (though I think that ship sailed long ago regardless). As far as unrelated changes, I've been finding that another good LLM use-case. Like hey Claude pull this PR <link> and break it up into three new branches that individually incorporate changes A, B, and C, and will cleanly merge in that order.

One minor nuisance that can come up with GitHub in particular when using a short reference like #123 is that that link breaks if the repo is forked or merged into something else. For that reason, I try to give full-url references at least when I'm manually inserting them, and I wish GitHub would do the same. Or perhaps add some hidden metadata to the merge commit saying "hey btw that #123 refers to <https://github.com/fancy-org/omg-project/issues/123>"

tormeh 1 hour ago|||

I've seen it be the concatenated individual git commit messages way too often. Just a full screen scroll of "my hands are writing letters" and "afkifrj". Still better than if we had those commits individually of course, but dear god.

The gold standard is rebased linear unsquashed history with literary commits, but I'll take merged and squashed PR commits with sensible commit messages without complaint.

AStrangeMorrow 24 minutes ago|||

I also like meaningful commit names. But am sometimes guilty of “hope this works now” commits, but they always follow a first fix that it turns out didn’t cut it.

I work on a lot of 2D system, and the only way to debug is often to plot 1000s of results and visually check it behaves as expected. Sometimes I will fix an issue, look at the results, and it seems resolved (was present is say 100 cases) only to realize that actually there are still 5 cases where it is still present. Sure I could amend the last commit, but I actually keep it as a trace of “careful this first version mostly did the job but actually not quite”

lopis 4 hours ago|||

If developers don't write commit messages, that's a culture problem. At my company we demand that of each other.

grepsedawk 6 hours ago|||

Only two of the five depend on commit messages. Churn, authorship, and velocity work regardless. Even teams with terrible hygiene write "fix" when something breaks.

SoftTalker 4 hours ago|||

As noted, authorship does not if commits are squashed, which seems to be common (I never do it).

KronisLV 4 hours ago|||

> Even teams with terrible hygiene write "fix" when something breaks.

They might not include anything but the Jira ticket number, if the environment is truly lacking.

travisgriggs 5 hours ago|||

Our small team has a lot of commit messages like this. For a while, we had a guy on the team who had come from a site that expected more. The pet peeve he brought along was that commit messages end with a period (my guess is that someone at their previous work place had reasoned that forcing periods encouraged developers to actually write meaningful sentences). When I look at that period of development, I see lots of messages like “stuff changed.” And “more stuff changed.” And then it goes back to just “stuff changed” around the time they moved on.

gcarvalho 1 hour ago||

> my guess is that someone at their previous work place had reasoned that forcing periods encouraged developers to actually write meaningful sentences

I have actually seen proper capitalization and correct conventional-commit types to correlate very well with the author being intentional and the patch being of good quality.

e.g.

- (a) chore: update some_module to include new_func

- (b) feat: Add new_func to handle XYZ case

Where:

(a) is not a chore, as it changes functionality, is uncapitalized and is so low-signal I can probably write a 10 line script to reliably generate similar titles.

(b) is using the correct "feat" commit type, capitalized and describe what this is for. I expect the body to explain "why", as well, and not to reiterate the "how" in natural language.

This is just my experience, but I've seen commit messages where people actually put in some effort to usually come with a good patch, and vice-versa.

heinrichhartman 5 hours ago|||

I personally use git commit -m "." for: "Just snapshots this state real quick" on a feature branch.

main branch is advanced on PR level, with squashed commits.

So the "." should never make it to main, and have PR description as commit message.

yreg 4 hours ago|||

> It's a small minority

Is it really a small minority? I have never worked on a project that didn't have commit messages that at least tried to be descriptive (sometimes people fail at it but its very different to an outright "changed stuff").

I don't remember any friend mentioning to me them encountering a work project where the messages were totally neglected either.

brabel 2 hours ago||

Never seen that in any company I worked at either and I can’t believe professional developers seem to think that it would be ok to write meaningless commit messages. That’s just so sloppy.

loremium 3 hours ago|||

tbh I'm not convinced that a git log history should be treated as a group journal because it's not.

relying on git commit messages assumes they're correct by convention since there is no technical constraint to enforce it. and it assumes no work in progress commits, sometimes it's just necessary to hit the save button real quick or move a workspace from one device to another.

my point is: git is a way of storing and loading files at its core.

kelnos 3 hours ago|||

Bad commit messages always fail PR review for me. It requires will and discipline, but it's not that hard.

sigmoid10 7 hours ago|||

Only two of the five insights are based on commit messages and the author acknowledges that they won't work in projects without message discipline. But the remaining ones will give you valuable insights even into the most lazy project department.

max8539 2 hours ago|||

Sometimes it could be just a ticket number/title

bartvk 47 minutes ago||

I think that's pretty great, actually. You can look that up to see more info about the commit.

harryquach 4 hours ago|||

Commit messages are often squashed after merging a feature branch

8cvor6j844qw_d6 7 hours ago|||

> AI generated commit messages

git log --oneline and a sprinkle of your personal sauce on .claude goes a long way :)

itmitica 7 hours ago|||

I love how the commentator thinks a developer makes decisions based on commit messages.

Random, subjective, or written in a state of mental exhaustion commit messages.

I also love the switcheroo the author made: git not logs. But hey :)

stronglikedan 5 hours ago||

It's because the vast majority of commit messages are never read by anyone, and there's other ways to fund out what happened in the handful of cases where you would need to.

mkehrt 1 hour ago|||

I read commit messages all the time to figure out what a change was about.

For small personal projects I often write one phrase messages with `-m`, but if you're working with other people you should be writing good commit messages.

scottyah 3 hours ago|||

git blame's are amazing, and rely on good comments.

joshstrange 6 hours ago||

I ran these commands on a number of codebases I work on and I have to say they paint a very different picture than the reality I know to be true.

> git shortlog -sn --no-merges

Is the most egregious. In one codebase there is a developer's name at the top of the list who outpaced the number 2 by almost 3x the number of commits. That developer no longer works at the company? Crisis? Nope, the opposite. The developer was a net-negative to the team in more ways than one, didn't understand the codebase very well at all, and just happened to commit every time they turned around for some reason.

kmacdough 2 hours ago||

Everything in context. This is one of many reasons I'm a proponent of squash-and-merge. If a change really needs more than one permanent commit, it should probably be split up or if absolutely necessary should be on a feature branch maintaining similar process. Under this process, feature branches are not squashed.

This leaves developers to commit locally and comment as much or little as they like.

dgunay 58 minutes ago|||

Yeah. I am the top committer at my current workplace, but I'd say that a majority of that gap is because my particular workflow results in many smaller commits than my coworkers.

Aperocky 4 hours ago|||

Also - be careful of automated workflow that uses a single persons credential.. skews this by a lot.

kevstev 32 minutes ago||

Once word got out that a report was going up to a department head around commit frequency, a few of us started to make "backup commits" to boost our stats. Whether it be dev server config files (just in case!), local dev setups, whatever.. just something that changed enough on its own but would produce a steady stream of commits, while having some potential use case, however unlikely it was to actually be needed.

Modern problems require modern solutions and all...

fenaer 5 hours ago|||

So that person, on one central codebase at a company I work for, is me.

Assuming I'm not ego-mad, I like to think this is because I built the project from the ground up before handing it over to the rest of the team.

These days other people commit more often than I do, but my name is still dominant, and probably will be for some time.

joshstrange 5 hours ago||

I'm not saying more commits = bad developer. In my example that happened to be the case but not because they had a lot of commits but because they were bad at their job. I was just trying to warn that taking these git snippets at face-value does not paint the full picture.

If someone came to me and said "I ran these and I see XXX was the most prolific committer and they left X months ago, what will be do???" I'd have to work hard not to laugh.

Since these snippets are self-described as ways to get familiar with the code/projects I wanted to provide the counter point. Most of those snippets do not at all paint the real picture and for all the repos I tested it on they paint the opposite of reality.

I know these codebases like the back of my hand, the purported purpose of these snippets is to better understand the codebase, I can tell you they don't work for anything I tested them on. Maybe they work for other codebases but the sample size I have access to says they don't work for me.

troyvit 5 hours ago||

I haven't finished the article yet but I think your point is an important one, and that's to run the commands with a context in mind. The article seems to be coming from the perspective of somebody who is brand new to the project, and as your experience indicates, interviewing teams and leads before running those commands might add more understanding to what they're telling you.

ramon156 9 hours ago||

> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”

The most changed file is the one people are afraid of touching?

rbonvall 8 hours ago||

Just like that place that's so crowded nobody goes there anymore.

dewey 9 hours ago|||

I've just tried this, and the most touched files are also the most irrelevant or boring files (auto generated, entry-point of the service etc.) in my tests.

nulltrace 8 hours ago|||

Yeah same thing happens with lockfiles and CI configs. You end up filtering out half the list before it tells you anything useful.

pydry 7 hours ago||||

I just tried it too and it basically just flagged a handful of 1500+ line files which probably ought to be broken up eventually but arent causing any serious problems.

Cthulhu_ 7 hours ago||

If it's (like in my case) dependency management, localization or config files, breaking them up will likely only cause more issues. Make sure that it's an actual improvement before breaking things up.

jbjbjbjb 8 hours ago|||

This command needs a warning. Using this command and drawing too many conclusions from it, especially if you’re new, will make you look stupid in front of your team mates.

I ran this on the repo I have open and after I filtered out the non code files it really can only tell me which features we worked on in the last year. It says more about how we decided to split up the features into increments than anything to do with bugs and “churn”.

functional_dev 2 hours ago|||

I found it interesting, that Git itself has built in similarity notion... when it packs objects, it groups files by path+size, runs delta cmpression to find which are close.

Very different from just counting commits - https://vectree.io/c/delta-compression-heuristics-and-packfi...

Pay08 7 hours ago||||

Good thing that the article contains that warning, then.

jbjbjbjb 7 hours ago||

Not really strong enough in a post about what to do in a codebase you’re not familiar with. In that situation you’re probably new to the team and organisation and likely to get off on the wrong foot with people if you assume their code “hurts”.

hamburglar 3 hours ago|||

The post is “here’s what I do”, not “here’s what you should do and then confront the team about the results.” It’s just showing you a quick way to get some insights. It’s not even guaranteeing it’s accurate, just showing you some things you might be able to draw some quick conclusions on.

I’m not sure why HN attracts this need to poke holes in interesting observations to “prove” they aren’t actually interesting.

jbjbjbjb 2 hours ago||

It’s a bit reductive to call it poking holes. The author shared his valuable knowledge and I shared mine.

mayama 5 hours ago||||

These commands are just about what files to start looking at to understand new codebase.

Eldt 6 hours ago|||

Better for people to know they're just blindly copying tools and parroting their output as if it's automatically meaningfully. Any warning against that should be built into the individual, for their own sake

thiisguy 2 hours ago||

Right? Some of these comments feel “you gave me commands to run and I should be able to turn my brain off to interpret the outputs”. These aren’t newbie commands so the assumption would be that you kinda know what you’re doing at least a little bit. If not, then don’t run them… similar to how you should approach all commands/things from the internet

berkes 3 hours ago|||

Plotting Churn against Complexity is far more useful than merely churn.

It shows places that are problematic much better. High churn, low complexity: fine. Its recognized and optimizef that this is worked on a lot (e.g. some mapping file, a dsl, business rules etc). Low churn high complexity: fine too. Its a mess, but no-one has to be there. But both? Thats probably where most bugs originate, where PRs block, where test coverage is poor and where everyone knows time is needed to refactor.

In fact, quite often I found that a teams' call "to rewrite the app from scratch" was really about those few high-churn-high-complexity modules, files or classes.

Complexity is a deep topic, but even simple checks like how nested smt is, or how many statements can do.

mememememememo 9 hours ago|||

Yes. Because the fear is butressed with necessity. You have to edit the file, and so does everyone else and that is a recipe for a lot of mess. I can think back over years of files like this. Usually kilolines of impossible to reason about doeverything.

mchaver 8 hours ago|||

Definitely not in my experience. The most changed are the change logs, files with version numbers and readmes. I don't think anyone is afraid of keeping those up to date.

zikani_03 2 hours ago|||

pom.xml and package.json came up on couple of separate projects I ran the commands on. Which makes sense because the versions get bumped rather frequently. I guess context matters, as usual.

jollyllama 6 hours ago|||

Yeah, the truth is going to be a lot more subtle than this.

furyofantares 5 hours ago|||

The LLM that wrote the copy is an idiot.

jamwil 3 hours ago||

This is such obvious LLM slop.

szszrk 9 hours ago|||

Could be also that a frequently edited file had most opportunity to be broken. And it was edited by the most random crowd.

KptMarchewa 7 hours ago||

In my case, it's .github/CODEOWNERS.

Nobody is afraid of changing it.

mayama 5 hours ago||

Why does github owners need frequent change? Do members in you team change so often?

mattrighetti 8 hours ago||

I have a summary alias that kind of does similar things

  # summary: print a helpful summary of some typical metrics
  summary = "!f() { \
    printf \"Summary of this branch...\n\"; \
    printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \
    printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \
    printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \
    printf \"%d commit count\n\" $(git rev-list --count HEAD); \
    printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d tag count\n\" $(git tag | wc -l); \
    printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \
    printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \
    printf \"\nSummary of this directory...\n\"; \
    printf \"%s\n\" $(pwd); \
    printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \
    printf \"%d file count via find command\n\" $(find . | wc -l); \
    printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \
    printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \
    printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \
    printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \
  }; f"

EDIT: props to https://github.com/GitAlias/gitalias

duskdozer 8 hours ago||

Curious - why write it as a function in presumably .gitconfig and not just a git-summary script in your path? Just seems like a lot of extra escapes and quotes and stuff

mattrighetti 8 hours ago|||

It's a very old config that I copied from someone many years ago, agree that it's a bit hard to parse visually.

Cthulhu_ 7 hours ago||||

Not the poster, but one theory: so you only need to copy one file. Portability.

mr_mitm 6 hours ago|||

Looks like the above assumes a POSIX shell, so one could argue a dedicated script would actually be more portable.

TonyStr 8 hours ago|||

Looks nice. Unfortunately I don't have log-of-count-and-email, log-of-count-and-day or churn

petey283 8 hours ago||

+1 Same.

Edit. https://github.com/mattrighetti/dotfiles/blob/master/.gitcon...

ape4 7 hours ago||

You could make a local `man` page.

JetSetIlly 9 hours ago||

Some nice ideas but the regexes should include word boundaries. For example:

I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.

nozzlegear 55 minutes ago||

This needs a small tweak to work on macOS, where git uses the POSIX version of grep (which doesn't support `\b`). You need to use the Perl Regexp option by switching -E with -P:

grepsedawk 6 hours ago||

Good catch, that's better

whstl 7 hours ago||

> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.

In my experience, when the team doesn't squash, this will reflect the messiest members of the team.

The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.

Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.

icedchai 6 hours ago||

I wouldn't trust "commit counts." The quality and content of a "commit" can vary widely between developers. I have one guy on my team who commits only working code that has been thoroughly tested locally, another guy who commits one line changes that often don't work, only to be followed by fixes, and more fixes. His "commits" have about 1/100th of the value of the first guy.

fpoling 6 hours ago|

The author does not look at counter values but rather at how the values changes. That reveals dynamics.

icedchai 5 hours ago||

My comment still seems relevant? Do frequent commits to correct mistakes imply more "value" than infrequent, but well tested, commits, or what? I don't think it is a reliable signal.

blenderob 6 hours ago||

> Is This Project Accelerating or Dying > > git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c

If the commit frequency goes down, does it really mean that the project is dying? Maybe it is just becoming stable?

heresie-dabord 4 hours ago||

For this command in particular, one can add a cheap bar chart with awk:

git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c | awk '{printf $2" "; for (i=1;i<=$1;i++){printf "-";} print ""; }'

zikani_03 1 hour ago||

This is a neat trick for a quick visual presentation, thanks!

goosejuice 4 hours ago|||

Yeah, this one demonstrates a particularly pernicious view of software development. One where growth, no matter how artificial, is the only sign of success.

If you work with service oriented software, the projects that are "dying" may very well be the most successful if it's a key component. Even from a business perspective having to write less code can also be a sign of success.

I don't know why this was overlooked when the churn metric is right there.

BeetleB 3 hours ago||

Bad memories at my former big tech company.

Whenever we initiated a new (internal) SW project, it had to go through an audit. One of the items in the checklist for any dependency was "Must have releases in the last 2 years"

I think the rationale was the risk of security vulnerabilities not being addressed, but still ...

ziml77 5 hours ago|||

That was my question too. I have plenty of projects I've worked on where they rarely get touched anymore. They don't need new features and nothing is broken.

SoftTalker 4 hours ago||

Is it fair to say they are being "worked on" if nothing is being done?

duckmysick 3 hours ago||

Sometimes you need to bump a dependency version, adjust the code to a changed API endpoint, or update a schema. Even if the core features stay the same, there's some expected maintenance. I'd still call that being worked on, in a sense that someone has to do it.

onion2k 6 hours ago|||

Technically you're correct that change frequency doesn't necessarily mean dead, but the number of projects that are receiving very few updates because they're 'done' is a fraction of a fraction of a percent compared to the number that are just plain dead. I'm certain you can use change frequency as a proxy and never be wrong.

Supermancho 5 hours ago||

> I'm certain you can use change frequency as a proxy and never be wrong.

I (largely) wrote a corporate application 8 years ago, with 2 others. There was one change 2 years ago from another dev.

Lots of programs are functionally done in a relatively short amount of time.

"Accelerating or Dying" sounds like private equity's lazy way to describe opportunity, not as a metric to describe software.

onion2k 5 hours ago||

That sort of project exists in an ocean of abandoned and dead projects though. For every app that's finished and getting one update every few years there are thousands of projects that are utterly broken and undeployable, or abandoned on Github in an unfinished state, or sitting on someone's HDD never be to touched again. Assuming a low change frequency is a proxy for 'dead' is almost always correct, to the extent that it's a reasonable proxy for dead.

I know people win the lottery every week, but I also believe that buying a lottery ticket is essentially the same as losing. It's the same principle.

goosejuice 3 hours ago||

With respect, this is a myopic view. Not all software is an "app" or a monolith. If you use a terminal, you are directly using many utilities that by this metric are considered dying or dead.

dan-bailey 6 hours ago|||

Projects become more stable with time? Since when?

Sharlin 6 hours ago|||

Something something Red Queen's race

stackedinserter 6 hours ago||

Or you hired someone who squashes or doesn't commit every single change.

croemer 8 hours ago|

Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)

markus_zhang 7 hours ago||

I also feel this reads like an AI slop, but at least I learned 5 commands. Not too bad.

More comments...