Resolving merge conflicts when introducing formatting to an existing codebase

On September 22, 2022 by Sosthène Guédon

If a project doesn't use any formatting tool, introducing them can be a headache, and is almost guaranteed to cause merge conflicts with any ongoing PR. Here's how to fix them.

Context

At some point in the life of a project, you might want to introduce code formatting. If not used from the beginning, it is likely that the overall formatting will not follow any convention, which makes code harder to read and to maintain: developers will have to remember to turn off their format-on-save and manually format their modifications.

While this applies to almost all programming languages and formatting tooling, the examples assume you're using Rust and cargo fmt. For your use case, replace the cargo fmt command by your tool of choice.

Let's assume:

  • You're using git
  • This repository has many active branches (pending Pull Requests for example)
  • Running cargo fmt on the latest commit on main changes almost every file in the project, causing merge conflicts with every active PR.
  • You created a commit on top of main that runs cargo fmt for the first time. This commit must not add any modification to the code. This commit may add a couple configuration files.
  • All branches branch of the commit just before cargo fmt. If this isn't the case you should use git rebase <commit before cargo fmt> make it work.

This post presents two solutions to the merge conflicts. One that's easy but not ideal using merge commits, and one that's way too complex and abuses rebase to get a perfect git history.

The easy solution with git merge

How-to

With the previous assumptions, this is what the tree looks like:

              Head of the branch to merge into main (called feature-branch)
 main branch             ... The many commits of the branch you want to merge
      │                   │            
     ...                  │
      │                   │            
393fd90 Cargo fmt         │            
      │                   │            
046732° common ancestor ──┘            

The easy way is to simply merge main into feature-branch. This will cause a ton of conflicts, but they can be resolved easily with git's merge strategies:

git merge -s ours 393fd90 called from the feature-branch

What the ours strategy does is that it resolves every conflict by taking the solution of the feature-branch. This essentially reverts all the formatting done in 393fd90 (the cargo fmt commit) You can then run cargo fmt again, and use git commit --amend to apply the results the merge commit.

This gets you the following tree:

  Head of main       Head of feature-branch
      │                   │            
     ...                  │            
      │                   │            
      ├─── Merge branch 'main' into 'feature-branch'
      │                   │            
      │                   │            
      │                  ... The many commits of the branch you want to merge
      │                   │            
      │                   │
      │                   │            
393fd90 Cargo fmt         │            
      │                   │            
0467320 common ancestor ──┘            

Normally, git should then allow you to merge feature-branch into main without issues.

Why you should avoid this

There are some issues with this strategy:

  • This assumes that the feature-branch's ancestor is the commit just before 393fd90 in main. If this is not the case, the -s ours risks deleting any work done in those commits.
  • Some repositories do no accept merge commits that resolve conflicts and expect branches to rebase, o. It is the case of the Rust project for example. Having a linear history makes git blame and git bisect more efficient.

The over engineered rebase solution

The objective

The ideal solution would be to use git rebase 393fd90 from feature-branch before merging. This gives us the following initial tree (left) and our objective (right):

                 Head of feature-branch                 Head of feature-branch 
                          │                                     │
                         ...                                   ...
                          │                                     │
               <commit-id> Many commits              <commit-id> Many commits
                          │                                     │
393fd90 Cargo fmt         │                 393fd90 Cargo fmt ──┘
      │                   │                      │
0467320 common ancestor ──┘              0467320 common ancestor
      │                                          │

This would requires us to rewrite every commit of the feature branch, as if it had been written from the beginning with formatting at each commit. This is annoying but can be easily automated.

How-to

Here's a magical shell script:

#!/bin/sh
git rm $(git ls-tree --full-tree -r --name-only "$1"\~1)  && \
  git checkout $1 -- . && \
  cargo fmt && \
  git add $(git ls-tree --full-tree -r --name-only "$1")  && \
  git commit -C "$1"

Store this script somewhere, make it executable and run git rebase -i 393fd90 (obviously by replacing the commit ID by your own commit applying cargo fmt).

This should open your editor for an interactive rebase:

pick 1fc6c90 Created main loop & timing control
pick 6b24810 Enabled config file parsing
pick dd14750 Misc bugfixes
pick c619260 Code additions/edits
pick fa39180 More code
pick 4ca2ac0 AAAAAAAA
pick 7b36970 AJKFJSLKDFJDKLFJ
pick 1952fb0 My hands are typing words

Remove all the commit messages and instead of using pick use the script. This should give you:

exec rebase-script.sh 1fc6c90
exec rebase-script.sh 6b24810
exec rebase-script.sh dd14750
exec rebase-script.sh c619260
exec rebase-script.sh fa39180
exec rebase-script.sh 4ca2ac0
exec rebase-script.sh 7b36970
exec rebase-script.sh 1952fb0

Save, quit, wait for git to make its magic... and Voila! You now have a rebased branch that looks just as if it had always been built with a formatting tool!

Why it works

Normally, when operating the rebase with the default pick option, git tries to apply the commit to the current branch. In our case, this would fail due to conflicts. Here are the steps taken by the script instead:

  1. git rm $(git ls-tree --full-tree -r --name-only "$1"\~1 removes all files tracked by git before the application of the current commit. To understand it, let's expand it. When $1 is replaced by the hash given to the script it looks like this: git rm $(git ls-tree --full-tree -r --name-only 1fc6c90~1). 1fc6c90~1 means 1 commit before 1fc6c90, so this command lists all the filed tracked in 1fc6c90's parent and deletes them.
  2. git checkout $1 -- . then tells git to load every file from 1fc6c90 into the current directory.
  3. cargo fmt runs the formatting
  4. git add $(git ls-tree --full-tree -r --name-only "$1") add every file that might have been modified by the previous step. We use git ls-tree (without ~1) again to avoid adding files that aren't meant to be tracked (for example the script itself).
  5. Finally, git commit -C "$1" creates a new commit, with the same parameter (message, date and author) as the commit is is replacing. Because they were deleted in step 1, this doesn't commit files deleted by 1fc6c90. This is why step 1 is important.

Wrap-up

This is something I've had to deal with, I hope this can help someone in the future.

If you don't understand something or believe something should be added, please email me at sosthene@guedon.gdn or reach me on Mastodon: @sgued@pouet.chapril.org