Operating Systems Concepts & Design
In the Linux and Unix world, “archiving” and “compression” are two distinct operations that are often performed together. While many people use the terms interchangeably, they serve different functional purposes in the filesystem.
Think of archiving as a “box” and compression as “vacuum-sealing” that box.
Tool: tar (Tape Archive).
gzip, bzip2, xz.The most common file format you will encounter is the tarball (e.g., backup.tar.gz). This is a two-step process: tar bundles the files, and then a compressor shrinks the bundle.
| Extension | Compressor | Speed | Compression Ratio |
|---|---|---|---|
| .tar | None (Archive only) | Instant | 0% |
| .tar.gz | gzip | Fast | Good |
| .tar.bz2 | bzip2 | Slow | Better |
| .tar.xz | xz | Very Slow | Best |
The tar command is famous for its “alphabet soup” of options. Here is the shorthand to remember:
tar -czvf backup.tar.gz /path/to/datatar -cJvf backup.tar.xz /path/to/data-c: Create-z / -J: Compression type (z for gzip, J for xz)-v: Verbose (show files as they are processed)-f: File (the name of the archive comes next)Regardless of the compression type, modern versions of tar are smart enough to auto-detect the format:
tar -xzvf backup.tar.gz-x: XtractChoosing a compressor depends on your specific goal:
.gz): Use this for daily backups or log rotation. It is very light on the CPU and fast to compress/decompress..xz): Use this for distributing software or long-term storage. It takes a long time to compress, but the resulting file is much smaller, saving bandwidth and disk space.While tar is the native Linux standard, zip is the universal standard for cross-platform compatibility (Windows/Mac).
tar.Before you dump 10 GB of data into your current directory, it is a good habit to “peek” inside the box first:
tar -tvf backup.tar.gz-t: Tell/List