LZ4 is a compression method that is promised to speed up compression and
decompression operations significantly in comparison to other similar
compression methods, such as deflate
or LZMA
.
See the LZ4 home page for benchmarks.
Facebook has open sourced their Mercurial version control system extension,
which uses this compression method. The extension is called lz4revlog
and is available at Bitbucket. The benefit of this extension is that
once it has been enabled, all revlog deltas for new clones will be using
the LZ4 compression method and thus gain the speed improvements.
More about the Mercurial revlog and its format.
Before the extension can be taken into use,
the Python bindings for LZ4 needs to be installed.
It is most likely easiest to do via pip
. For that to be possible,
pip
itself needs to be installed first.
pip install lz4
On Windows OS, the full path to the pip
binary might need to be given:
C:\Python27\Scripts\pip.exe install lz4
Once the prerequisites are installed, fetch the latest lz4revlog
and
install it:
hg clone https://bitbucket.org/facebook/lz4revlog
cd lz4revlog
python setup.py install
Again in Windows OS, the full path to python
might be needed, making the last
command to look something like:
C:\Python27\python.exe setup.py install
After installing the extension in to the system, it can
be taken into use by editing the Mercurial configuration file,
~/.hgrc
(Mac/Linux) or %USERPROFILE%\mercurial.ini
(Windows):
[extensions]
# Mac
lz4revlog = /usr/local/lib/python2.7/site-packages/lz4revlog.py
# Linux
lz4revlog = /usr/local/lib/python2.7/dist-packages/lz4revlog.py
# Windows
lz4revlog = C:\Python27\Lib\site-packages\lz4revlog.py
Once the extension has been enabled, any future clones will be using the new compression format. In case any existing repositories are wanted to use it, they need to be cloned again.
To compare the megabyte size of the fresh checkout against the previous:
du -sm earlier-clone
du -sm fresh-clone
Make sure that there are no files that do not belong to the repository, for example logs or npm modules.
In my test case, the size for earlier-clone
is about 965 MB, while the
fresh-clone
came out as about 1147 MB. That is about 19% increase.
I sure hope that the performance gain in speed is noticeable, but since
Facebook promises it to be “significant”, I must believe it.
It can also be tested, by running time hg verify
few times on the
two different clones from the same repository. Results in the table below:
counter | earlier | fresh |
---|---|---|
real | 18.902s | 21.853s |
user | 16.321s | 14.859s |
sys | 2.321s | 2.978s |
Nothing major to be seen over there. Perhaps the difference is to be seen with some other means, however I will not go in deeper at this point.
Mercurial comes also with an internal time measuring parameter, --time
, so
let’s see how much difference it generates to the above benchmarks:
earlier: real 24.320 secs (user 17.610+0.000 sys 2.970+0.000)
fresh: real 19.690 secs (user 13.900+0.000 sys 2.460+0.000)
Also compared with running the following commands few times on both clones, but numbers vary so little that they make no real difference.
time hg update -r 210 -q
time hg update -r 421 -q
time hg update -r 842 -q
Considering that the hard disk space is relatively cheap, versus the work time possibly saved, perhaps this extension should be used as a default by any development oriented company that relies on Mercurial.
Please note that in case the extension is disabled after it has been used for cloning, the given repository becomes unreadable:
abort: repository requires features unknown to this Mercurial: lz4revlog!
Luckily the compression method seems to be used widely enough, so it is unlikely to disappear suddenly.
Unfortunately the speed differences are not convincing. Perhaps there is something that I left out, which would have made the real difference. Having said that, I now have one of my projects using LZ4 on its revlog and will keep using until further notice. Maybe the benefits will reveal later…