Data compression using zstd

Table of Contents

1 Introduction

Zstd (or Zstandard) is a data compression library developed by Facebook. It is especially optimized for high compression and decompression speeds. Another interesting feature of this library is that it offers "dictionary compression," wherein you can train the algorithm with some files, producing a dictionary which needs to be fed to the compressor and decompressor. This gives improved compression for small files. The main use case for dictionary compression is when you have a lot of small files of the same type (same statistics) that need to be compressed separately.

This documentation is particularly helpful for getting started with zstd, and some example source code is provided here.

This tecmint article might also be helpful. The official documentation can be found here.

1.1 Installing zstd

The easiest way is to build from source. You can simply clone the repo and install the libraries as root. Run the following commands in the directory where you want to clone the repo.

$ git clone https://github.com/facebook/zstd.git
$ cd zstd
$ sudo make install

The include files are generally in /usr/local/include, while the library files are in /usr/local/lib.

1.2 Basic compression and decompression

Download common.h, simple_compression.c and simple_decompression.c to your local folder. Also create an empty file called emptdict in this directory (we won't be doing dictionary compression for now, so this will serve as our dictionary).

To compile simple_compression, run

$ gcc -Wall -I/usr/local/include/ -c dictionary_compression.c -lm

For linking, use

$ gcc -L/usr/local/lib/ dictionary_compression.o -o compressor -lzstd

This will create an executable file called compressor. To run this, use

$ ./compressor raw_filename emptydict

You can also use terminal commands to compress using zstd:

$ zstd -T4 -10 raw_filename

performs compression of the file raw_filename using compression level 10, and uses at most 4 threads. Usage and options for compression and decompression can be found using man zstd.

One can train the dictionary using

$ zstd --train -B1024 --maxdict=10240 training_file

splits the training_file into chunks of size 1024 each and creates a dictionary of size 10240B. The resulting dictionary file can be used for dictionary compression.



Author: Shashank Vatedka

Created: 2022-01-31 Mon 10:52