XITASO Tech Talk: History Rewriting in Git

Fair Warning

Do not rewrite published history!

In this context, published means any history that others have already access to. Usually this means anything you have pushed to a shared repository.

If anyone else has based work on the history you are rewriting, they will need to manually fix their history afterwards. While this is possible, avoid it if you can.

For information and help if you have to rewrite published history, or if you are affected by someone else doing it, see the section “Recovering from upstream rebase” of the git-rebase man page.

Rewriting History – Rewriting Philosophy?

There are different views about whether you should rewrite history at all, or how liberally you should use history rewriting. Simplifying for brevity, there are two views:

History should be a record of what actually happened. Consequently, changing history is equivalent to lying about what happened (and should never be done). It should be reserved as a last resort, e.g. when there are serious security concerns.

History should be the story of how your project was made. And you wouldn’t publish the first draft of a book. History rewriting is an accepted means of improving history, making it clear and consistent.

Amending Commits

Amending is only applicable to HEAD, i.e. the last commit on a branch. Care should be taken if other branches are already based on yours, since amend will not touch those (if you need this, scroll down to “Rebasing” – but keep our Fair Warning in mind).

If you want to fix a typo in your commit message or you forgot mentioning that ticket number, you can easily do this:

  • git commit --amend -m "<new message>"

You also add or remove some files that you forgot, for example when you didn’t save them before committing:

  • git add / git rm as needed
  • git commit --amend
  • Commit message is asked in editor if not given using -m (editor is pre-filled with old message)

If you want to “uncommit” the last commit, in order to edit the files again and create a new commit from them instead, you can use:

  • git reset HEAD~

Merging by Squashing

If you want to combine the changes of an entire (feature) branch into a single commit and apply it to HEAD, git-merge has an option for this:

  • git merge --squash <feature branch>
  • Note: using --squash never commits the “merge” → You have to do it manually with git-commit

The typical use-case for merge-squashing is making the history cleaner and more presentable. For example, you might prefer to have a single fix/feature commit instead of 20 trial-and-error commits.

Cherry-Picking

In the context of Git, cherry-picking means taking the changes of a given commit (or sometimes multiple commits) and applying them on top of HEAD. Typically, you would use this to introduce specific changes made on another branch. This may be needed to apply a hotfix to the mainline or a feature branch, or to get specific bug fixes early on other feature branches.

“Taking changes and applying them” is quite literal – basic cherry-picking is equivalent to creating a patch from the changes you want and applying that patch on your HEAD.

The command itself is quite easy:

  • git cherry-pick <commit> ...
  • Option -e: edit commit message before committing
  • Option -x: appends “(cherry picked from commit …)” to new commit message
  • Option -n: only apply the changes to the working copy, do not commit

You can pass git-cherry-pick multiple commits, specified singly or as a commit range – it will then create a new commit for each commit passed in (in order).

Rebasing

When you want to move one or more commits into a different base commit, git-rebase is what you do. It can be used as an alternative to merging. Where a merge recombines two divergent lines of history, a rebase takes one side of the diverged history, and appends it to the other. The effect of rebase in this way is a (more) linear history, as opposed to a “mergy” one.

Similar to cherry-picking, rebasing moves the changes of the selected commits (not the snapshots), so it’s a process where the diff of each commit to its parent is applied in turn.

A notable difference between rebasing and cherry-picking is which side of the moving history you are on. Let’s say you have a branch master and a branch feature, with feature having 2 commits you want to transplant onto master. When you are on master and cherry-pick the 2 commits from feature, the changes are applied on master, and master is moved up to the second new commit that is created; feature remains unchanged. Conversely, when you are on feature and rebase onto master, the two new commits are created on top of master, but feature is then moved to the second new commit; master remains unchanged. (While cherry-picking and rebasing can be used to produce the same results, especially in such a simple situation, rebasing is more powerful in general.)

You can use rebase in the following ways:

  • git rebase <target>
    • Move all commits that are in HEAD but not in <target> onto <target>, and update HEAD (and the branch it points to, if any)
  • git rebase <target> <source>
    • Shortcut for git checkout <source> && git rebase <target>
    • Remains checked out on source afterwards
  • git rebase --onto <target> <source_from> <source_until>
    • Move commits that are in <source_until> but not in <source_from> onto <target>
    • That is, allows you to select commit ranges to transplant
    • Remains checked out on <source_until> afterwards

Note: man gitrevisions has details on commit ranges in Git, including some illustrative examples.

Interactive Rebasing

As an extended form of rebasing, interactive rebasing lets you selectively move, reorder, pick, remove, squash and edit commits.

To start an interactive rebase, use any of the rebase commands in the previous section with the -i/--interactive switch. This opens an editor presenting you with a list of affected commits, where you can specify an action for each one. The editor content also contains some curt instructions on the available actions.

If you save and close the actions list as-is (every commit has the “pick” action), the resulting rebase is exactly the same as a non-interactive rebase.

Some of the other things you can do with interactive rebase, even if you’re not moving stuff onto another branch:

  • Fix up a commit that is not the latest on the branch (i.e. where amending is not possible). This includes both content changes and fixes to the commit message.
  • Reorder and remove commits (see below).
  • Squash some commits, i.e. having finer-grained control than squash-merging.
  • Split commits apart, e.g. to turn one big commit into multiple groups of logically connected changes.

Once you save and exit the editor, Git applies the actions you specified from top to bottom. This means two things. One, you can re-order changes if you want, e.g. to group commits that belong together. Two, only commits contained in the actions list are acted on, allowing you to “remove” commits that you no longer need.

(Side note: As you may know, “removed” commits are not immediately deleted from the repo. Using tools like Git’s reflog, you can find unreferenced history for some time, if you need to restore something that’s no longer on any of your branches.)

Branch Filtering

There may come a day when all the other history rewriting tools in your arsenal just don’t cut it. Git’s filter-branch is a powerful tool that may help you that day.

Filter-branch allows you to process some branches or the entire history of your repository, commit by commit, and apply a command or filter to each commit.

Some common use-cases are:

  • Remove a file from every commit. For example, some sensitive data may be in the repo, which was fine while used internally, but now you want to grant more public access to that repo. Or your repo size has ballooned because big binary files were added, and you want to remove them to clean up.
  • Make a subfolder the new repository root. For instance, a component of your project has gotten more important and should be moved to its own repository, but you want to retain any history related to it. You could duplicate your project repository, and in the new copy make the component’s folder the new root.
  • Change the email of one or more committers on all their accounts. This may be required when you import a repository from some other VCS into Git.

The general command pattern is:

  • git filter-branch <filter(x)> -- <revision selection>

Some more concrete commands for the above examples:

  • To remove the file path/to/file from every commit:
    git filter-branch --index-filter 'git rm --cached --ignore-unmatch path/to/file' -- --all
  • To make the folder src/component/core the new root folder:
    git filter-branch --subdirectory-filter foodir -- --all
  • To change the email address mary@olddomain.com to mary.jane@newdomain.net on all commits:
    git filter-branch --env-filter '
    	if test "$GIT_AUTHOR_EMAIL" = "mary@olddomain.com"
    	then
    		GIT_AUTHOR_EMAIL=mary.jane@newdomain.net
    		export GIT_AUTHOR_EMAIL
    	fi
    	if test "$GIT_COMMITTER_EMAIL" = "mary@olddomain.com"
    	then
    		GIT_COMMITTER_EMAIL=mary.jane@newdomain.net
    		export GIT_COMMITTER_EMAIL
    	fi
    ' -- --all

We used the argument --all separated by double dashes to select all tags and branches (i.e. the entire referenced history). Instead of --all, you can also specify a single or multiple revisions. More details on this are given in the git-filter-branch man page.

In that man page, you can find a list of all filters that filter-branch supports and more usage examples.

Resources all around Git