Thursday, August 18, 2011

Download an Entire Website Using wget

If you want to download an entire website for off-line viewing you probably need to learn this. wget is a nice tool in-built in many Linux systems. The basic syntax of wget is:
$ wget [option]... [URL]...

Using this we can simply download all the URLs specified on the command line. some of the required options are:

$ wget \
  --recursive \ : download complete website recursiverly
  --page-requisites \ : gets all elements of the page
  --html-extension \ : save file with html extension
  --convert-links \ : convert links so that it works off-line
  --domains website.org \ : don't follow links outside domain
  --no-parent \ : don't follow links outside directory
Many sites refuses you to connect or sends a blank page if they detect you are not using a web-browser. So, to solve this problem use -U My-browser to tell the site you are using some commonly accepted browser:
wget -r -p -U Mozilla http://www.aaaaaaaa.com/restriced.html

Note: You can also use wget for Windows. I haven't used this but you can try it.
Follow this link: http://gnuwin32.sourceforge.net/packages/wget.htm

Enjoy!