Checking -local or -remote linksReferenceSetting up mappings: -mapfrom and -maptoControlling recursion: -rec

Controlling recursion: -rec

By default, recursion is off. This means that Big Brother reads the documents you explicitly specify (either on the command line or using the -stdin option), makes sure that the links they contain are valid, and stops.

If you want to check a whole site at once, you must turn recursion on. When recursion is on, Big Brother not only checks each link, but also determines whether it points to an HTML document. If it does, Big Brother fetches the document (which can be local or remote) and checks the links in it. If one of these links points to an HTML document, it is also fetched, and so on, recursively.

Of course, this process has to stop at some point, otherwise you are likely to check all of the World Wide Web! So, when recursion is on, you must provide a regular expression to define the boundary of your site. Every time Big Brother finds a link, it shall match it against the regular expression you have provided. If it does match, Big Brother fetches the document and checks it recursively; otherwise, it checks the link but doesn't follow it.

Here is the simplest, and most common, example. The address of my home page is http://pauillac.inria.fr/~fpottier/. If I want to check my whole site in a single run of Big Brother, I invoke it with the following option:

-rec "^http://pauillac\.inria\.fr/~fpottier/"

Since the expression doesn't end with a $ sign, it matches all URLs which start with http://pauillac.inria.fr/~fpottier/. Notice that I inserted a backslash character in front of each dot, because otherwise a dot matches any character.

Here is another example: suppose I maintain several Web servers, but all of them are within the domain orange.com. Then, I can use the following regexp:

-rec "^http://.*\.orange\.com/"

This will match the address of any document located on any server within the domain orange.com.

Note that the address is checked against the recursion regexp after the mappings have been applied to the address. So, if you are using mappings, say, to map a remote site to a certain folder of your hard disk, the recursion regexp should describe the local files' addresses, not the remote ones'. For instance, using ^file: as recursion regexp simply allows recursion on all local files.


François Pottier, May 5, 2004

Checking -local or -remote linksReferenceSetting up mappings: -mapfrom and -maptoControlling recursion: -rec