LZ4 is a compression method that is promised to speed up compression and decompression operations significantly in comparison to other similar compression methods, such as
LZMA. See the LZ4 home page for benchmarks.
Facebook has open sourced their Mercurial version control system extension, which uses this compression method. The extension is called
lz4revlog and is available at Bitbucket. The benefit of this extension is that once it has been enabled, all revlog deltas for new clones will be using the LZ4 compression method and thus gain the speed improvements. More about the Mercurial revlog and its format.
Before the extension can be taken into use, the Python bindings for LZ4 needs to be installed. It is most likely easiest to do via
pip. For that to be possible,
pip itself needs to be installed first.
pip install lz4
On Windows OS, the full path to the
pip binary might need to be given:
C:\Python27\Scripts\pip.exe install lz4
Once the prerequisites are installed, fetch the latest
lz4revlog and install it:
hg clone https://bitbucket.org/facebook/lz4revlog cd lz4revlog python setup.py install
Again in Windows OS, the full path to
python might be needed, making the last command to look something like:
C:\Python27\python.exe setup.py install
After installing the extension in to the system, it can be taken into use by editing the Mercurial configuration file,
~/.hgrc (Mac/Linux) or
[extensions] # Mac lz4revlog = /usr/local/lib/python2.7/site-packages/lz4revlog.py # Linux lz4revlog = /usr/local/lib/python2.7/dist-packages/lz4revlog.py # Windows lz4revlog = C:\Python27\Lib\site-packages\lz4revlog.py
Once the extension has been enabled, any future clones will be using the new compression format. In case any existing repositories are wanted to use it, they need to be cloned again.
To compare the megabyte size of the fresh checkout against the previous:
du -sm earlier-clone du -sm fresh-clone
Make sure that there are no files that do not belong to the repository, for example logs or npm modules.
In my test case, the size for
earlier-clone is about 965 MB, while the
fresh-clone came out as about 1147 MB. That is about 19% increase.
I sure hope that the performance gain in speed is noticeable, but since Facebook promises it to be "significant", I must believe it. It can also be tested, by running
time hg verify few times on the two different clones from the same repository. Results in the table below:
Nothing major to be seen over there. Perhaps the difference is to be seen with some other means, however I will not go in deeper at this point.
Mercurial comes also with an internal time measuring parameter,
--time, so let's see how much difference it generates to the above benchmarks:
earlier: real 24.320 secs (user 17.610+0.000 sys 2.970+0.000) fresh: real 19.690 secs (user 13.900+0.000 sys 2.460+0.000)
Also compared with running the following commands few times on both clones, but numbers vary so little that they make no real difference.
time hg update -r 210 -q time hg update -r 421 -q time hg update -r 842 -q
Considering that the hard disk space is relatively cheap, versus the work time possibly saved, perhaps this extension should be used as a default by any development oriented company that relies on Mercurial.
Please note that in case the extension is disabled after it has been used for cloning, the given repository becomes unreadable:
abort: repository requires features unknown to this Mercurial: lz4revlog!
Luckily the compression method seems to be used widely enough, so it is unlikely to disappear suddenly.
Unfortunately the speed differences are not convincing. Perhaps there is something that I left out, which would have made the real difference. Having said that, I now have one of my projects using LZ4 on its revlog and will keep using until further notice. Maybe the benefits will reveal later...