Posted by hemogloben 4 days ago
I would like to comment on this:
> I have been asked countless times if it's better to merge or to rebase and while I never want to stir up a hornet's nest, I have always advocated merging over rebasing.
I've been involved in this discussion many times as well, and the correct answer is that one isn't inherently "better", and you shouldn't _always_ prefer one over the other. There are situations when a merge is preferable (e.g. to keep a branch in history), and others when a rebase is (e.g. to, well, _base_ some work on a specific commit). The choice of when to use either will depend on the author's or team's preference in each case, which is why it's given as an option in most web-based PR/MR workflows. Squashing is another task you don't want to always do either.
I partly blame this confusion on Git's UI, and on the baseless fears spread about rebasing for years, which many developers mistakenly absorbed. The amount of times I've heard that force-pushing after a rebase is "dangerous" is too high. No wonder people find it scary...
They're really not.
First of all, no data is really lost with Git. Commits can be recovered from the reflog if they haven't been garbage collected, and there are ways of recovering anything on GitHub as well[1], even if it technically shouldn't be the case.
But this aside, data loss is circumstantial, like you say. I've heard the idea that force-pushing in general is harmful, when it's really not if you're working solo or on an isolated branch. Rebasing and force-pushing are just different tools in the toolbox.
In general, my objection is to the practice of describing any software as "dangerous". It creates an air of intimidation that prevents people from using the tools to their full extent, which when spread can popularize wrong practices among new users as well. This is why you see the person in the article claiming that they've always been a "merger", having a false dilemma between merging and rebasing, and describing their solution as "fearless". This line of thinking is also commonly associated with the command line and Linux itself, and is just harmful.
Instead, users should be educated on what the software does, which does require having comprehensive UIs and documentation, and designing the software with sane defaults, fail-safes, and ways to undo any action. Git doesn't do a great job at all of these, but overall it's not so bad either. What really hurts users is spreading the wrong kind of ideas, though.
So no data is really lost except when it is.
> I've heard the idea that force-pushing in general is harmful, when it's really not if you're working solo or on an isolated branch. Rebasing and force-pushing are just different tools in the toolbox.
Like I said, there are specific circumstances where you can do it safely. But that's very different from being safe in general.
> This is why you see the person in the article claiming that they've always been a "merger", having a false dilemma between merging and rebasing, and describing their solution as "fearless". This line of thinking is also commonly associated with the command line and Linux itself, and is just harmful.
> Instead, users should be educated on what the software does, which does require having comprehensive UIs and documentation, and designing the software with sane defaults, fail-safes, and ways to undo any action.
Users can't and won't learn the full details of everything they use, especially "secondary" tools that they use to support their main workflow - and why should they, unless the benefits are large enough to justify that cost? Using a tool in a mode that is inherently safe rather than a mode that can cause data loss in some circumstances is a perfectly reasonable choice. "Fearless" and "dangerous" are perfectly reasonable ways to characterise this distinction.
The period before inaccessible objects are pruned automatically is quite relaxed by default (between 2 weeks and 90 days, depending on the object), and it is configurable. So the scenario we're discussing here where data is lost by a force-push is just not a practical concern.
> Like I said, there are specific circumstances where you can do it safely. But that's very different from being safe in general.
No, force-push is safe _in general_. It is a bit inconvenient to recover the inaccessible commits if someone makes a mistake, but this doesn't make rebasing or force-pushing unsafe.
> Users can't and won't learn the full details of everything they use, especially "secondary" tools that they use to support their main workflow
Huh? How is Git a "secondary" tool for a programmer? It is an essential part of the programmer's toolkit as much as an editor is, and understanding and being proficient with both is equally important. Users in this case should be expected to learn the tools they will be relying on for a large part of their career. Compared to the complexity of programming environments, stacks and languages we deal with on a daily basis, this tooling is fairly simple to grasp.
I'm not saying that Git doesn't have issues that can't be improved—it certainly does—but in the grand scheme of things it is a simple, reliable and well-engineered piece of software.
(And then I don’t think there are any more “force” flags left to worry about…!)
https://stackoverflow.com/questions/65837109/when-should-i-u...
Personally, I'm lazy and just always have it in recent substring match history in the shell
That depends entirely on your organization's (or project's) preferred branching strategy and what is accepted as a unit of change: Some accept entire features as a single commit (via squash merging dev/feature branches - very useful when you have to maintain multiple release branches and can easily cherry-pick features & big fixes): here, merges faster advantage. Other places care a lot about the individual commits and preserving commit history from dev/feature branches - here merges can hide some of that granularity, and rebases are a better fit. The latter is common for projects with one evergreen release branch without any concern about back-porting features or fixes to other, currently supported release branches; supporting versions N, N-1, and N-2 is common in enterprise software, with each having its own release branch or tag.
To me it's virtually zero in seven years but it might be due to the teams and projects I've been involved with.
That work is easier when they haven’t squashed their changes. Because I can see how they got there and if it was a mistake or a misunderstanding.
People who prefer squash are an automatic red flag because they usually don’t like asking Why, which is a very important skill on products that are shipping and making money.
That sounds like a problem with the people you work with, not with squashing in general.
>People who prefer squash are an automatic red flag because they usually don’t like asking Why, which is a very important skill on products that are shipping and making money.
This is a wild generalization. Thoughtful people squash when they think they have a set of changes that go together. If someone is jamming together stuff that does not go together then that is indeed a problem, but not a problem with squash. Nobody really wants to see the 50 edits someone made to come up with one final change.
No, it says something about me, not them. When people can't figure out problems on their own they come to me for help. Have been since I was a sophomore in college, which was a long ass time ago. Possibly before you were born (8 month account). So I have a pretty good idea where 'rock bottom' is for every class of tool I've ever used, and how often people get close to them.
I also get called in to look at bugs that other people refuse to believe exist, and bug forensics is where you really, really see the difference between a good commit history and a shitty one. If you aren't using 'git annotate' weekly or daily then you are not qualified to comment on how merges should or shouldn't be done. "I don't use it" means you don't have an opinion. "... so you shouldn't use it" is telling your coworkers you don't give a shit.
> This is a wild generalization
I think you're confusing red flag with deal breaker.
> Thoughtful people squash when they think they have a set of changes that go together.
True but useless distinction. Define 'go together'. Everyone has a different definition of this and you will never reach consensus there. Most of the people I'm thinking of here think everything for a single story 'goes together'. This is how you get an initial commit for a new module with 600+ lines of code and eight bugs you have to solve the hard way because all of the bugs showed up in a single commit.
Squashing before a PR fails Knuth's aphorism about code being meant to be read by humans and only incidentally by machines.
If you don't like that it took you three tries to figure out an off by one error in your code, that's fine. But you don't have to destroy all other evidence of your other processes in order to cover up your brainfart.
It's incredibly powerful for (just from decent commit messages) figuring out why some little detail in the code is the way it is.
I'm thankful every day that I get to mandate Gerrit (so rebased-patches-on-top-of-main) workflow with every individual commit going through CI.
ETA: Incidentally, I'm usually also someone who often gets called in to figure out obscure-yet-important bugs... and the commit log is instrumental to that.
OK I totally see that now.
Speaking of red flags, your whole comment is a red flag to me, just like mentioning that common workflows are "red flags" lol.
>If you aren't using 'git annotate' weekly or daily then you are not qualified to comment on how merges should or shouldn't be done. "I don't use it" means you don't have an opinion. "... so you shouldn't use it" is telling your coworkers you don't give a shit.
More narcissistic garbage takes. There are many ways to work and if someone doesn't do it your favorite way then that doesn't mean they are reckless, incompetent, or whatever. If you told this to anyone I work with or have ever worked with in real life in the last 20 years, you'd get laughed at. I might know a lunatic who would argue with you in real life but even he might not be motivated enough to take the bait. He is a very junior-minded person as well, whose experience does not match his interests.
>Squashing before a PR fails Knuth's aphorism about code being meant to be read by humans and only incidentally by machines.
This is too reductive. You have to use common sense when squashing stuff. If you put stuff together that does not go together, then it gets harder to figure out what a changeset is supposed to do.
>If you don't like that it took you three tries to figure out an off by one error in your code, that's fine. But you don't have to destroy all other evidence of your other processes in order to cover up your brainfart.
There need be no evidence of "processes" in the end. I can see why you might want that if you're helping your coworkers figure something out. But once it's figured out then those changes should be reduced to modular changesets that each do a particular thing. Anything else will introduce pointless noise into the codebase. If you feel that some particular state of the code represents something significant, you can make a commit for that. But certainly 80% of the commits most people make are purely noise.
History is preserved in the branch along the PRs if needed, and it rarely is.
I'm not saying that rebasing is useless (I default to it), I'm debating if the effort is worth it in engineering terms, which I generally don't see because the benefits seem to be small compared to the cost.
As a git merge fan, are there any tips or tricks you suggest beyond the stock git experience when doing git merge to minimize the amount of merge conflicts you get?
I found it was especially bad when doing a git merge on a refactor, but I admit it could just be that I abandoned git merge earlier in my career before switching to rebase and never properly learned it
My most common use cases are feature or bug branches with a lifespan between less than one day and up to one month (although I absolutely have some features on pause for even over a year, in which case interactive git rebase and partially squashing WIP commits is my current method of updating)
All this is for repos from literally just me, to a few changes a month between 3 devs, to 5-2 devs doing multiple commits per day, to some open source projects with commits landing every few minutes from multiple devs if it's like a release day
My current biggest issue with rebase is verified commits with GitHub and a bit of guilt for rewriting committed feedback from other authors on my PRs
The only time I really use git merge is when I want to see how my work interplays with more than one feature branch at once, or if the feature branch I want to integrate hasn't rebases themselves in a bit and conflicts occur
The first time this happened (that I caught) I had two engineers who were sniping at each other. One was older "Max" and not great at data structure algorithms. The other "Stan" was a decent coder but had a bad attitude and was awful with git. Somehow he thought he could raise his status by getting Max kicked off the team.
I come back from lunch one day and Stan is bitching about a bug in Max's new code that's causing issues. To keep these two from fighting I've been reviewing all of Max's PRs and the line of code Stan is complaining about I know for a fact I checked, and was relieved to see Max got it right the first time. But sure enough, the repo says Max fucked it up.
Twenty minutes of git archaeology later and sure enough, Stan messed up a merge and resolved the conflict wrong, introducing the phantom bug. So I showed him the step by step of my diagnosis and then we had another little talk about using rebase.
For what's it worth, I agree for most projects I've been on. I've rarely e.g. used deep Git history forensics to figure out a regression, or to figure out why some code is the way it is. Usually I'm just tracking down the fairly recent squashed commit of a pull request that introduced the problem and it's obvious enough where to look to fix it.
I like the idea of clean, super fine-grained commits with good summaries but I never see people mention that this takes extra time to do, because putting a pull request together is usually a messy iterative process, and not a predictable sequence of clean independent commits.
Real work is more like "Add sketch of code ... Iterate some more ... Fix bug ... Iterate some more ... Upgrade library ... Really fix the bug ... Clean up ... Merge from main and get working ... Refactor ... Add comments ... Fix PR requests". Rebasing as you go or going back at the end to break that into chunks that will each independently make sense and pass tests costs a lot of time? Maybe I'm missing something?
The time vs benefit trade-off is probably different with huge teams and huge projects, but for solo projects, small teams, and medium projects the trade-offs are different.
Feels similar to test suite discussions. People don't mention there's a cost vs benefit trade-off to how fine grained your tests should be for different scenarios as it depends on a lot of factors you need to balance.
Of course if you're planning a hard fork, merges may be unavoidable. But I've seen too many Franken-linux-kernels which were forked from 4.x with periodic merges whose correctness is impossible to verify. Inconsistencies eventually build up with each merge.
The linear commit history created by rebasing made it trivial to bisect and determine what introduced the problem.
Huge difference to my productivity.
In your situation I'd prefer merges because: if commit X used to have parent A, and you move it over to parent B, it gets a new commit hash and a version of the code that has never been tested. If that commit is broken: was it broken when the author wrote it, or did it only break when you rebased? You threw away your only means of finding out when you rewrote history.
People who prefer git rebase workflow will hate the complicated history they see in "git log", but otherwise it will be the same.
Alternatively, the right way to use "git merge" is to merge every successive commit of a branch one by one.
The problem with "git merge" is that it collapses multiple commits into one giant patch bomb.
If one of the commits caused a problem, you don't have that commit isolated on the relevant stream (the trunk) where you are actually debugging the problem.
You know that the merge introduced a problem, and it seems that it was a particular commit there. But you don't have that commit by itself in the stream where you are working.
It can easily be that a commit which worked fine on a branch only becomes a problem in its merged form on the trunk, due to some way a conflict was resolved or whatever other coincidence or situation. Then, all you know is that the giant merge bomb caused a problem, but when you switch to the branch, the problem does not reproduce and thus cannot be traced to a commit.
If that commit is individually brought into the trunk, the breakage associated with it will be correctly attributed to it.
In both cases, the source material the same: the original version of the commit doesn't exhibit the problem on its original branch.
It is pretty important to merge the individual changes one by one, so that you are changing fewer things in one commit.
People like rebase because it does that one by one thing. Git rebase breaks the relationship by not recording the extra parents, but since they have the reworked version of each change on the stream they care about, they don't care about that. Plus they like the tidy linear history.
As we all test different parts of the microprocessor and the tagging system reflected those parts, I could rule stuff out by looking at git log --oneline. The commit messages were also required to be high quality and I could get a gut feeling about what stuff a commit would touch without looking at the code.
> if commit X used to have parent A, and you move it over to parent B, it gets a new commit hash and a version of the code that has never been tested. If that commit is broken: was it broken when the author wrote it, or did it only break when you rebase? You threw away your only means of finding out when you rewrote history.
This happened semi-frequently. We were using Gerrit and had every version of a rebased commit visible together. When code that fails automated testing got submitted, it immediately caused CI failures for everyone. It took an hour for someone unfamiliar with the code to look at the timestamp the failures began, find the commit that caused the failures, and revert it.
I don't see how this would be meaningfully different in a merge scenario, because the merge commit also wouldn't be tested.
Why wouldn't it? This is the "not rocket science" rule of software engineering: every commit must pass the tests. There's no special exception for merge commits.
Not a case of the company being too cheap to spend the money, because there literally aren't enough engineering prototypes in the world to satisfy our CI needs for testing on them.
I have spent hours rebasing on very active branches when a merge would've taken minutes (as many colleagues do) just because "it's a best practice" but I've never got to fully appreciate the benefits.
The main reason people want to rebase instead of merging is to keep the commit history from looking like a bowl of spaghetti. A commit history like that is hard to navigate, and more likely to contain a lot of frivolous edits.
You should always do the easiest thing that gets you what you want, otherwise you're just doing pointless work. If you and your colleagues are happy merging and you find that easier, that's what you should be doing.
Rebasing supports a totally different workflow. With a rebase I can submit a well-formed set of changes for review that are conflict free. You can't do that with merging. With merging you submit a bunch of crap "history" that nobody will ever look at and the project maintainer has to deal with the conflicts.
Merge commits look like this (newest to oldest):
* Final tweaks
* Merge master branch
* Implement bar
* Fix foo
* Merge master branch
* Shit I did on Wednesday before lunch
* Implement foo
* End of day
Totally impenetrable mess that nobody will ever look at.Rebased commits look like this:
* Add customise option to UI
* Add use case baz
* Extend model to support bar
* Refactor model foo
These can be reviewed in insolation and when approved they merge without conflict.If you want to do this but none of your colleagues are on board and you don't have the swing to make them, then I'm sorry. But you are wasting your time rebasing in that case. :(
The merge version looks like the way code is actually written in practice to me, so doesn't the rebased version take extra time to create after you're done adding code? E.g. the "Add use case baz" rebase commit isn't likely to be a simple squashing of commits from the merge version, but cherrypicking specific lines from multiple commits.
I fully agree the rebased version is nicer, but I'm not seeing anyone talk about how much extra time it takes. Or you're doing it in a way that doesn't take much time somehow?
I'm not saying that every single feature must be split into multiple commits. If it's one change then just keep amending that one change as you go. But quite often you'll identify standalone changes as you go, like refactors, little unrelated bugfixes you find as you go etc. When this happens I'll commit that unrelated change separately and rebase to reorder it so it comes first, then continue amending my feature commit.
The paradigm shift for a lot of people is not to think of git as tracking history. Nobody cares about that. It's useless to you and doubly useless for everyone else. Think instead about tracking changes. I don't need to know every key press, every dead end explored or what you did on Tuesday afternoon. I want to know what changes are being applied to the project.
Maybe it depends on the kind of feature as well? I do the rebase with clean separate commits approach when it's easy, where reordering and squashing commits doesn't create tricky conflicts.
But for more exploratory stuff like UI/UX changes where I'm moving blocks of the UI around, and making changes in multiple files to add plumbing to get data where it needs to go, and changing it after demoing and getting feedback, it can get really messy with lots of dead-ends you backtrack out of later.
For that kind of work, it's probably easier to start again in a new branch, figure out some logical way to group the changes, then copy in code snippets from the other branch rather than rebasing? I can't see how this would be worth the effort in most cases though. The more granular commits helps figuring out where a bug got introduced, but then I don't think this happens often and when it does it's usually pretty obvious which lines of code caused the bug even in a large commit e.g. if dates are now being formatted weirdly, look for changes to code that does stuff with dates.
I never need to rebase, or unfuck a botched rebase or go reflog diving - and the commit history is linear where it matters.
You can do that with merge just as easily though - just merge master into your branch.
Bisect only makes sense when commits are rebased into changes. The moment you bring in a regression you've fucked your ability to effectively bisect.
So your automated bisect tells you to look at two whole commits instead of one. Big deal.
If you keep history as-is, most commits will compile and pass tests because coders tend to compile and run tests as part of their work cycle (and the occasional isolated non-compiling or non-test-passing commit isn't a problem for a bisect). If you rebase you will end up with long chains of commits that don't compile unless you have some additional mechanism to prevent that.
> really paid off in engineering terms?
When you want your changes accepted by upstream and they either
1. Won’t accept a merge-filled history
2. Indirectly won’t because they accept changes by email (can’t send merges by email)
I also don't know of any pain to it, though. It's just simple and easy and clean.
The only time it might make sense if you are following some arbitrary strict style guidelines for commits. Some people care more about the commit history than others, not that either way is necessarily better.
I have a friend, he thought rebasing for linear history was not worth the effort. I told him to do it, because I once had to find a regression over thousands of commits in a merge-heavy code base and it took days. He was not convinced.
Then he had to find a regression. It took over a month.
With git bisect's binary search, it would have taken half a day.
My friend now rebases.
Any given snapshot has a linear history, so it should be as bisectable as the rebased equivalent. What am I missing here?
Not sure what you mean. The key thing of merges is that they... merge... two histories.
A git history graph tool shows that clearly then.
A bisection has to choose whether to go left or right.
It didn't even occur to me that anybody would permit that in their CI.
If you check in commits that don't compile then you can't use automatic bisection effectively (it still does work if that happens rarely, thanks to automatic `git bisect skip`).
Of course every non-building commit will make bisecting a merge history a pain even more, not sure why you think it to be better with merges than with linear history.
> It didn't even occur to me that anybody would permit that in their CI.
How do you enforce it? Are you saying you make your CI compile and run tests for every single commit on a feature branch before allowing it to be merged? That takes a lot of time if you're doing the kind of small commits that make bisection most effective.
> Of course every non-building commit will make bisecting a merge history a pain even more, not sure why you think it to be better with merges than with linear history.
Because with rebase you're much more likely to get a long chain of commits that don't compile. E.g. imagine developer A adds a new feature and starts off by writing some code that calls some function, and meanwhile developer B renames that function in master. Then a while later developer A rebases onto master, fixes their compilation errors, and merges their feature branch in. All of the commits A did in between now don't compile, so you will "git bisect skip" all of them, and if your bisect lands somewhere in that chain of commits you have to do another round of bisection manually or something.
With merge, all of A's commits still compile and you can bisect through to the specific commit that caused the problem. (Maybe one or two isolated commits don't compile because they were never tested on CI, sure, but that's ok - git bisect skip handles them, it's only a problem if you have a long chain of non-compiling commits)
> How do you enforce it?
I don't think you can. You just rely on the the developer to only create compiling commits (if possible). Also, code review might catch these.
> Because with rebase you're much more likely to get a long chain of commits that don't compile
After a rebase you try to compile the code and it will fail due to the renamed function. Then you fix the function name and move this change into the commit that started using this function (perhaps employing a fixup commit). Now, all following commits compile because they have the fixed call site, and previous commits compile as well because the call wasn't there yet.
Right. But there's a natural incentive to create compiling commits as you work (because when you're working on something you at least occasionally compile your code and run tests). There's much less incentive to go back and check after a rebase.
> Also, code review might catch these.
Pretty unlikely - usually people just review the overall diff, not the individual commits, and even if they do, the commits make sense visually whether they compile or not.
> Then you fix the function name and move this change into the commit that started using this function (perhaps employing a fixup commit).
If you are disciplined enough to notice and do this right, sure. But it's extra work that eats into you discipline budget.
Git rebase does not destroy history, it just does not link it together. That might be a bad thing. But the individual commits all making an appearance on the destination branch is a good thing.
From those who favor merge, what is bad is that there are no second parent pointers tracking where those changes came from.
This coulid be obtained by reimplementing git rebase as a sequence of merges. Git rebase is a sequence of zero or more cherry picks, not merges. If git rebase merged each commit instead of cherry picking, each commit would have a parent pointing to its original.
In a git bisect, there would be no need to chase those second parents; you would be looking for which merged commit introduced the breakage, and not care about its original, except in some rare situations where you want to analyze more deeply what went wrong (and then that parent pointer would be a bit handy).
Disagree. You can always flatten a commit graph into a linear history if you want, but you can't restore the original commit graph from a linear history. So preserving the original history is objectively better.
Clean history can exist with merges, but I think merging all over the place obviously encourages messy behaviors.
You can't rebase to get back to the original commits, not without knowing what they are.
> Imagine trying to bisect some spaghetti bowl of commits with merges to find the source of a recurring issue.
I do it all the time (well, less so now that I work with a better team where those issues are pretty rare), it's easy, that's the whole point of the git bisect command.
> It would be relatively nonsensical compared to a clean linear history.
Rebased history is much harder to bisect because you often get long chain of commits that don't compile or are otherwise broken.
Git usage is only one part of a wider engineering org. That like saying "bugless code is objectively better" without considering time to delivery, engineering resources, etc.
Merge versus rebase is just git speak for two different ways of tracking things when diverging streams recombine.
In a nutshell, merge creates a single new commit which brings all the cumulative changes from a source branch onto the target. The commit has two parents: the prior commit on the destination branch and the commits on the source branch.
Rebase creates a new commit out of every individual commit on the branch, bringing them individually into the target branch, much as if they were being merged. However, they have only one parent: their target branch lineage.
(Rebasing is way better from a conflict point of view because the changes are individually brought in. A merge creates a "patch bomb" on the destination branch in which serveral totally unrelated conflicts might have been resolved, pertaining to different commits in the original.)
And honestly, there are far too many engineers that use rebase without understanding the underlying system which is dangerous in git. (Aside, I wish git would adopt hg’s stages)
[rerere]
enabled = true
autoupdate = true
If you want to have it forget a recorded resolution, for example because you messed something up, there are commands for that, but I use them very seldomly.
I’ve never run into any particular pitfalls to speak of. I mostly just turn it on and forget it’s there. You can still always go back in time with the reflog if needed.
I don't think that rerere offers fine grained enough control over "forgetting". What I needed (until I realized my mistake) was a way to clear any memory of a resolution for the current conflict in path.
Either way, even simpler, imho, than any log that one has to comb through after the fact is to create a named backup
branch=$(git branch --show-current) && git switch -c backup-${branch} && git switch -
Carry on as planned and if you bork it all, switch to the backup branch which retains the original commits and all, delete the borked one and have another go git switch backup-somebranch && git branch -D somebranch && git branch -m somebranch
Note: First I thought that `ORIG_HEAD` was the thing. But that won’t work if you did `git reset` during the rebase.
(`ORIG_HEAD` is probably “original head”, not “origin head” (like the remote) that I first thought…)
[1] You just have to comb through documentation!
Dropbox doesn't have a notion of uncommitted data. Why should source control?
Pick your most important repo. Make sure everything is committed. Doing something stupid like `git reset --hard HEAD~100`. Look how fucked your work is. Do `git reset --hard HEAD@{1}`. Look at how nothing was lost.
Among its other virtues, reflog makes safe the highly empowering 'git-commit --amend'.
git rebase —onto
I don't think they are the first ones to open-source their code on a timer, but they are trying to popularize the idea.