Xlunzip

lzip logo

Introduction

Xlunzip is a test tool for the lzip decompression code of my lzip patch for linux. Xlunzip is similar to lunzip, but it uses the lzip_decompress linux module as a backend. Xlunzip tests the module for stream, buffer-to-buffer, and mixed decompression modes, including in-place decompression (using the same buffer for input and output). You can use xlunzip to verify that the module produces correct results when decompressing single member files, multimember files, or the concatenation of two or more compressed files. Xlunzip can be used with unzcrash to test the robustness of the module to the decompression of corrupted data.

The distributed index feature of the lzip format allows xlunzip to decompress concatenated files in place. This can't be guaranteed to work with formats like gzip or bzip2 because they can't detect whether a high compression ratio in the first members of the multimember data is being masked by a low compression ratio in the last members.

The xlunzip tarball contains a copy of the lzip_decompress module and can be compiled and tested without downloading or applying the patch to the kernel.

My lzip patch for linux can be found at http://download.savannah.gnu.org/releases/lzip/kernel/

Learn more about lzip in the Lzip Home Page.

Lzip related components in the kernel

The lzip_decompress module in lib/lzip_decompress.c provides a versatile lzip decompression function able to do buffer to buffer decompression or stream decompression with fill and flush callback functions. The usage of the function is documented in include/linux/lzip.h.

For decompressing the kernel image, initramfs, and initrd, there is a wrapper function in lib/decompress_lunzip.c providing the same common interface as the other decompress_*.c files, which is defined in include/linux/decompress/generic.h.

Analysis of the in-place decompression

In order to decompress the kernel in place (using the same buffer for input and output), the compressed data is placed at the end of the buffer used to hold the decompressed data. The buffer must be large enough to contain after the decompressed data extra space for a marker, a trailer, the maximum possible data expansion, and (if the compressed data consists of more than one member) N-1 empty members.

                 |------ compressed data ------|
                 V                             V
|----------------|-------------------|---------|
^                                    ^  extra
|-------- decompressed data ---------|

The input pointer initially points to the beginning of the compressed data and the output pointer initially points to the beginning of the buffer. Decompressing compressible data reduces the distance between the pointers, while decompressing uncompressible data increases the distance. The extra space must be large enough that the output pointer does not overrun the input pointer even if all the overlap between compressed and decompressed data is uncompressible. The worst case is very compressible data followed by uncompressible data because in this case the output pointer increases faster when the input pointer is smaller.

        |                        *   <-- input pointer
        |                    *   ,   <-- output pointer
        |                * ,  '
        |            x  '            <-- overrun (x)
memory  |        * ,'
address |    *   ,'
        |*     ,'
        |    ,'
        |  ,'
        |,'
        '--------------------------
                    time

All we need to know to calculate the minimum required extra space is:

The maximum expansion ratio of LZMA data is of about 1.4%. Rounding this up to 1/64 (1.5625%) and adding 36 bytes per input member, the extra space required to decompress lzip data in place is: "extra_bytes = (compressed_size >> 6) + members * 36"

Using the compressed size to calculate the extra_bytes (as in the equation above) may slightly overestimate the amount of space required in the worst case. But calculating the extra_bytes from the uncompressed size (as does linux currently) is wrong (and inefficient for high compression ratios). The formula used in arch/x86/boot/header.S "extra_bytes = (uncompressed_size >> 8) + 65536" fails to decompress 1 MB of zeros followed by 8 MB of random data, and wastes memory for compression ratios larger than 4:1.

Documentation

Xlunzip only includes a man page and a README file. For information about the lzip file format see the online manual of lzip below.

The manual is available in the info system of the GNU Operating System. Use info to access the top level info page. Use info lzip to access the lzip section directly.

An online manual for lzip can be found here.

Download

The latest released version of xlunzip can be found at http://download.savannah.gnu.org/releases/lzip/xlunzip/. You may also subscribe to lzip-bug and receive an email every time a new version is released.

Once xlunzip is installed, the files from archive "foo.tar.lz" can be extracted using the command "xlunzip -cd foo.tar.lz | tar -xf -".

How to get help

For general discussion of bugs in xlunzip the mailing list lzip-bug@nongnu.org is the most appropriate forum. Please send messages as plain text. Please do not send messages encoded as HTML nor encoded as base64 MIME nor included as multiple formats. Please include a descriptive subject line. If all of the subject are "bug in xlunzip" it is impossible to differentiate them.

An archive of the bug report mailing list is available at http://lists.gnu.org/mailman/listinfo/lzip-bug.

How to help

To contact the author, either to report a bug or to contribute fixes or improvements, send mail to lzip-bug@nongnu.org. Please send messages as plain text. If posting patches they should be in unified diff format against the latest version. They should include a text description.

See also the lzip project page at Savannah.

Licensing

Xlunzip is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.

Valid HTML 4.01 Strict


Copyright © 2020 Antonio Diaz Diaz
Lzip logo Copyright © 2013 Sonia Diaz Pacheco

You are free to copy, modify and distribute all or part of this article without limitation.

Updated: 2020-06-26