How do "fetchers" work?

Basically, "fetcher" is a simple object responsible for delivering external files to the script. Default fetcher object supplied with html2ps/pdf fetches HTML, images and CSS from remote sites using HTTP protocol. If you're using your own fetcher, you need to implement 'get_data' function returning contents of requested file and, probably, 'get_base_url', returning URL to be used as a base one while resolving relative URLs in recently fetched HTML file.

The image below illustrates simple html2ps session using default fetcher while converting html file from abstract test.com site.

If you have pages stored on your local system or dynamically generated and kept in memory, you don't need to use HTTP protocol to fetch them. In this case, you should use custom fetcher, so session will look similar to image below. Note that fetcher processes all requests, returning valid content for all requests; this makes difference from the very simple fetcher supplied with html2ps, which does always return memory string content whatever the request is. Internals of the fully-featured fetcher will depend on your system architecture greatly, so most likely such fetcher will never be included to html2ps distribution.

The image below illustrates why images and external stylesheets are not rendered when you're using too simple fetcher object.

Sometimes you need to fetch files from different places; for example, HTML code is generated locally, while images and CSS files should be fetched via HTTP protocol. In this case you'll need to use several fetchers at once, as illustrated below. Note that in this case you need to implement 'get_base_url' function returning correct URL so script will be able to resolve relative URLs contained in HTML code.