Data compression using zstd

Table of Contents

1 Introduction

Zstd (or Zstandard) is a data compression library developed by Facebook. It is especially optimized for high compression and decompression speeds. Another interesting feature of this library is that it offers "dictionary compression," wherein you can train the algorithm with some files, producing a dictionary which needs to be fed to the compressor and decompressor. This gives improved compression for small files. The main use case for dictionary compression is when you have a lot of small files of the same type (same statistics) that need to be compressed separately.

This documentation is particularly helpful for getting started with zstd, and some example source code is provided here.

This tecmint article might also be helpful. The official documentation can be found here.

1.1 Installing zstd

The easiest way is to build from source. You can simply clone the repo and install the libraries as root. Run the following commands in the directory where you want to clone the repo.

$ git clone https://github.com/facebook/zstd.git
$ cd zstd
$ sudo make install

The include files are generally in /usr/local/include, while the library files are in /usr/local/lib.

1.2 Basic compression and decompression

Download common.h, simple_compression.c and simple_decompression.c to your local folder. Also create an empty file called emptdict in this directory (we won't be doing dictionary compression for now, so this will serve as our dictionary).

To compile simple_compression, run

$ gcc -Wall -I/usr/local/include/ -c dictionary_compression.c -lm

For linking, use

$ gcc -L/usr/local/lib/ dictionary_compression.o -o compressor -lzstd

This will create an executable file called compressor. To run this, use

$ compressor raw_filename emptydict

Author: Shashank Vatedka

Created: 2021-06-11 Fri 16:09