paazmaya.fi

The Website of Juga Paazmaya | Stories about Web Development, Japanese Martial Arts, Hardware prototyping and travelling

Speed up Mercurial version control with LZ4 compression, theoretically at least

LZ4 is a compression method that is promised to speed up compression and decompression operations significantly in comparison to other similar compression methods, such as deflate or LZMA. See the LZ4 home page for benchmarks.

Facebook has open sourced their Mercurial version control system extension, which uses this compression method. The extension is called lz4revlog and is available at Bitbucket. The benefit of this extension is that once it has been enabled, all revlog deltas for new clones will be using the LZ4 compression method and thus gain the speed improvements. More about the Mercurial revlog and its format.

Before the extension can be taken into use, the Python bindings for LZ4 needs to be installed. It is most likely easiest to do via pip. For that to be possible, pip itself needs to be installed first.

pip install lz4

On Windows OS, the full path to the pip binary might need to be given:

C:\Python27\Scripts\pip.exe install lz4

Once the prerequisites are installed, fetch the latest lz4revlog and install it:

hg clone https://bitbucket.org/facebook/lz4revlog
cd lz4revlog
python setup.py install

Again in Windows OS, the full path to python might be needed, making the last command to look something like:

C:\Python27\python.exe setup.py install

After installing the extension in to the system, it can be taken into use by editing the Mercurial configuration file, ~/.hgrc (Mac/Linux) or %USERPROFILE%\mercurial.ini (Windows):

[extensions]

# Mac
lz4revlog = /usr/local/lib/python2.7/site-packages/lz4revlog.py

# Linux
lz4revlog = /usr/local/lib/python2.7/dist-packages/lz4revlog.py

# Windows
lz4revlog = C:\Python27\Lib\site-packages\lz4revlog.py

Once the extension has been enabled, any future clones will be using the new compression format. In case any existing repositories are wanted to use it, they need to be cloned again.

To compare the megabyte size of the fresh checkout against the previous:

du -sm earlier-clone
du -sm fresh-clone

Make sure that there are no files that do not belong to the repository, for example logs or npm modules.

In my test case, the size for earlier-clone is about 965 MB, while the fresh-clone came out as about 1147 MB. That is about 19% increase.

I sure hope that the performance gain in speed is noticeable, but since Facebook promises it to be “significant”, I must believe it. It can also be tested, by running time hg verify few times on the two different clones from the same repository. Results in the table below:

counterearlierfresh
real18.902s21.853s
user16.321s14.859s
sys2.321s2.978s

Nothing major to be seen over there. Perhaps the difference is to be seen with some other means, however I will not go in deeper at this point.

Mercurial comes also with an internal time measuring parameter, --time, so let’s see how much difference it generates to the above benchmarks:

earlier: real 24.320 secs (user 17.610+0.000 sys 2.970+0.000)
fresh: real 19.690 secs (user 13.900+0.000 sys 2.460+0.000)

Also compared with running the following commands few times on both clones, but numbers vary so little that they make no real difference.

time hg update -r 210 -q
time hg update -r 421 -q
time hg update -r 842 -q

Considering that the hard disk space is relatively cheap, versus the work time possibly saved, perhaps this extension should be used as a default by any development oriented company that relies on Mercurial.

Please note that in case the extension is disabled after it has been used for cloning, the given repository becomes unreadable:

abort: repository requires features unknown to this Mercurial: lz4revlog!

Luckily the compression method seems to be used widely enough, so it is unlikely to disappear suddenly.

Unfortunately the speed differences are not convincing. Perhaps there is something that I left out, which would have made the real difference. Having said that, I now have one of my projects using LZ4 on its revlog and will keep using until further notice. Maybe the benefits will reveal later…