Using `wget` and `zip` to Archive a Website

By: Cam Wohlfeil
Published: 2019-04-22 1200 EDT
Category: Solutions
Tags: linux

The command is long but easy enough:

wget --recursive --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains calculusmadeeasy.org --no-parent http://calculusmadeeasy.org/

Now that you have the site downloaded, time to archive it:

To zip some files: zip squash.zip file1 file2 file3

To zip a directory: zip -r squash.zip dir1

To unzip: unzip squash.zip

I only really use the zip -r9 archive.zip files... syntax (-r = recursive -9 = max compression).

You can also compress files with the GNU tar program: tar -zcvf myfile.tgz. This will do the current directory.

To unzip that file, use: tar -zxvf myfile.tgz. That's assuming of course that you have a tar capable of doing the compression as well as combining of files into one. If not, you can just use tar cvf followed by gzip (again, if available) for compression and gunzip followed by tar xvf. Or use tar jcvf file.tar.bz2... to compress in bzip2 format or tar Jcvf file.tar.xz ... for xz compression.

Note on file sizes:

ls -l --block-size=M will give you a long format listing needed to actually see the file size and round file sizes up to the nearest MiB. If you want MB (10^6 bytes) rather than MiB (2^20 bytes) units, use --block-size=MB instead. If you don't want the M suffix attached to the file size, you can use something like --block-size=1M.

If you simply want file sizes in "reasonable" units, rather than specifically megabytes, then you can use -lh to get a long format listing and human readable file size presentation. This will use units of file size to keep file sizes presented with about 1-3 digits so you'll see file sizes like 6.1K, 151K, 7.1M, 15M, 1.5G and so on.