Apache Arrow Git Tips

I’m used to working with git merge and this has made life difficult for me when working with Apache Arrow because it doesn’t use a merge model and pull request branches often need to rebased against master and force pushed.

There are numerous ways to work in this model but this article documents the approach I use, based on some guidance I was given on one of my PRs. I’m documenting this for my own benefit but hopefully it helps others too.

Creating a Fork

When you first fork the repo, you need to set up the upstream apache git repo as a remote. I prefer to name this remote apache rather than upstream since it is more explicit, especially if I add other remotes (such as other contributors forks).

git remote add apache https://github.com/apache/arrow

Keeping your fork’s master up to date

First off all, we need to know how to keep our forks up to date. Do not use the standard Github fork workflow. Do not use the word merge ever. Do this instead (on your fork):

git checkout master
git fetch apache
git reset --hard apache/master
git push -f

Working on a feature branch

It doesn’t matter what name you choose for your branch but I recommend using the JIRA identifier since it makes it easier to keep track.

git checkout -b ARROW-1234

Go ahead and push commits to this branch. When you are ready to create a PR it is quite likely that other commits have been merged to master, so it’s a good idea to rebase to make sure there are no conflicts. I recommend reading this article about git rebase.

Run the following commands to rebase your feature branch. First we get our master up to date …

git checkout master
git fetch apache
git reset --hard apache/master
git push -f

… and then we can rebase our feature branch and force push …

git checkout ARROW-1234
git rebase
git push -f

Note that force pushing a branch does not send a notification to people watching the PR so you will need to write a comment on the PR to alert reviewers.

Recovering from an accidental merge

If, like me, you use git merge all day long in your day job, it is inevitable that you will do this when working on Arrow (thanks, muscle memory). It is possible to recover using this set of commands:

git checkout -b ARROW-1234-squash
git fetch apache
git reset --hard apache/master
git merge --squash ARROW-1234
git commit -m "commit message here"
git checkout ARROW-1234
git reset --hard ARROW-1234-squash
git push -f