Using `wget` and `zip` to Archive a Website
By: Cam Wohlfeil
Published: 2019-04-22 1200 EDT
Category: Solutions
Tags:
linux
The command is long but easy enough:
wget --recursive --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains calculusmadeeasy.org --no-parent http://calculusmadeeasy.org/
--recursive
: download the entire Web site.--domains website.org
: don't follow links outside website.org.--no-parent
: don't follow links outside the directory tutorials/html/.--page-requisites
: get all the elements that compose the page (images, CSS and so on).--html-extension
: save files with the .html extension.--convert-links
: convert links so that they work locally, off-line.--restrict-file-names=windows
: modify filenames so that they will work in Windows as well.--no-clobber
: don't overwrite any existing files (used in case the download is interrupted and resumed). doesn't work with convert-links
Now that you have the site downloaded, time to archive it:
To zip some files: zip squash.zip file1 file2 file3
To zip a directory: zip -r squash.zip dir1
To unzip: unzip squash.zip
I only really use the zip -r9 archive.zip files...
syntax (-r
= recursive -9
= max compression).
You can also compress files with the GNU tar program: tar -zcvf myfile.tgz
. This will do the current directory.
To unzip that file, use: tar -zxvf myfile.tgz
. That's assuming of course that you have a tar capable of doing the compression as well as combining of files into one. If not, you can just use tar cvf
followed by gzip
(again, if available) for compression and gunzip
followed by tar xvf
. Or use tar jcvf file.tar.bz2...
to compress in bzip2 format or tar Jcvf file.tar.xz ...
for xz compression.
Note on file sizes:
ls -l --block-size=M
will give you a long format listing needed to actually see the file size and round file sizes up to the nearest MiB. If you want MB (10^6 bytes) rather than MiB (2^20 bytes) units, use --block-size=MB
instead. If you don't want the M suffix attached to the file size, you can use something like --block-size=1M
.
If you simply want file sizes in "reasonable" units, rather than specifically megabytes, then you can use -lh
to get a long format listing and human readable file size presentation. This will use units of file size to keep file sizes presented with about 1-3 digits so you'll see file sizes like 6.1K, 151K, 7.1M, 15M, 1.5G and so on.