Course Info / Syllabus

Labs

Lecture Topics

Exam Reviews

Source Code & Other Resources


Operating Systems Concepts & Design

Archiving and Compression

In the Linux and Unix world, “archiving” and “compression” are two distinct operations that are often performed together. While many people use the terms interchangeably, they serve different functional purposes in the filesystem.


1. Archiving vs. Compression

Think of archiving as a “box” and compression as “vacuum-sealing” that box.


2. The Standard: The “Tarball”

The most common file format you will encounter is the tarball (e.g., backup.tar.gz). This is a two-step process: tar bundles the files, and then a compressor shrinks the bundle.

Common Formats & Tools

Extension Compressor Speed Compression Ratio
.tar None (Archive only) Instant 0%
.tar.gz gzip Fast Good
.tar.bz2 bzip2 Slow Better
.tar.xz xz Very Slow Best

3. Essential Commands

The tar command is famous for its “alphabet soup” of options. Here is the shorthand to remember:

Creating Archives

Extracting Archives

Regardless of the compression type, modern versions of tar are smart enough to auto-detect the format:


4. Why Use Different Compressors?

Choosing a compressor depends on your specific goal:

  1. Gzip (.gz): Use this for daily backups or log rotation. It is very light on the CPU and fast to compress/decompress.
  2. XZ (.xz): Use this for distributing software or long-term storage. It takes a long time to compress, but the resulting file is much smaller, saving bandwidth and disk space.

5. Other Tools: Zip and Unzip

While tar is the native Linux standard, zip is the universal standard for cross-platform compatibility (Windows/Mac).


6. Pro Tip: Listing without Extracting

Before you dump 10 GB of data into your current directory, it is a good habit to “peek” inside the box first: