1<html>
2<head>
3<title>API description</title>
4<link rel="stylesheet" type="text/css" medial="all" title="Default" href="css/help.css"/>
5<style type="text/css">
6div.note {
7  margin: 0.5em 0;
8}
9
10div.class {
11  margin: 0.5em 0 0.5em 2em;
12}
13
14div.interface {
15  margin: 1em 0 0.5em 0;
16  padding: 2px 5px;
17  background-color: #f0f0f0;
18}
19
20span.interface_name {
21  font-weight: bold;
22}
23
24span.method_name {
25  font-weight: bold;
26}
27</style>
28</head>
29<body>
30
31<h1>Beware: GLOBALS!</h1>
32<p>
33At the moment, the layout/conversion engine makes use of several global variables:
34<ul>
35<li>$g_config array (in particular, $g_config['renderforms'], $g_config['renderlinks'], $g_config['renderimages'],
36            $g_config['debugbox'], $g_config['mode'], $g_config['cssmedia'] and $g_config['draw_page_border']
37            elements for all output methods and $g_config['ps2pdf'] and $g_config['transparency_workaround'] for
38            'fastps' output method.</li>
39<li>$g_px_scale</li>
40<li>$g_pt_scale</li>
41</ul>
42Please take this into account while using API. We're planning to get rid of these globals eventually. For a while,
43you may initialize these global with the code from samples above.
44</p>
45<p>
46Also, there's some global items script initializes itself:
47<ul>
48<li>$g_box_uid</li>
49<li>$g_colors</li>
50<li>$__g_css_manager</li>
51<li>$__g_css_handler_set</li>
52<li>$g_encoding_aliases</li>
53<li>$g_frame_level</li>
54<li>$g_font_resolver</li>
55<li>$g_font_resolver_pdf</li>
56<li>$g_html_entities</li>
57<li>$g_image_cache</li>
58<li>$g_last_assigned_font_id</li>
59<li>$g_manager_encodings</li>
60<li>$g_media</li>
61<li>$g_predefined_media</li>
62<li>$g_stylesheet_title</li>
63<li>$g_tag_attrs</li>
64<li>$g_unicode_glyphs</li>
65<li>$g_utf8_converters</li>
66</ul>
67There's no need to initialize or modify these variables; just don't accidentally overwrite them. Some of them
68are here for "historical" reasons and will be eventually removed. Some are here due lack of static class variables
69in older PHP versions.
70</p>
71
72<h1>Conversion pipeline</h1>
73<div>
74<b>PipelineFactory</b> is a simple factory class simplifying building of <b>Pipeline</b> instances;
75<b>create_default_pipeline()</b> will build a simple ready-to-run conversion pipeline. The usage of
76<b>PipelineFactory</b> is not required; you may create the <b>Pipeline</b> object and fill
77the appropriate fields manually.
78
79<pre class="code">
80class PipelineFactory {
81  function create_default_pipeline();
82}
83</pre>
84</div>
85
86<div>
87<b>Pipeline</b> class describe the process of conversion as a whole; it contains references to classes, described
88above and is responsible for calling them in correct order and error handling.
89<pre class="code">
90class Pipeline {
91  var $fetchers;
92  var $data_filters;
93  var $parser;
94  var $pre_tree_filters;
95  var $layout_engine;
96  var $post_tree_filters;
97  var $output_driver;
98  var $output_filter;
99  var $destination;
100
101  function Pipeline();
102
103  function configure($options);
104  function process($data_id, &$media);
105  function process_batch($data_id_array, &$media);
106  function error_message();
107
108  function &get_dispatcher();
109}
110</pre>
111</div>
112
113</div>
114
115<h1>Description of interfaces and classes</h1>
116
117<div class="note">
118Almost all interfaces described below include
119<span class="method_name">error_message</span> method.
120It should return the user-readable description of
121the error. This description MAY contain HTML tags, but should remain
122readable in case tags are removed.
123</div>
124
125<div class="interface">
126<p><span class="interface_name">Fetcher</span> interface provides a method of
127fetching the data required
128to build a document tree. Normally, classes implementing this interface would
129fetch an HTML/XHTML string from somewhere (e.g. from remove HTTP server,
130local file or database). Nevertheless, it MAY fetch ANY data provided that
131this data will be understood by parser. The pipeline object may contain
132several fetcher objects; in this case they're used one-by-one until
133one of them return non-null value.</p>
134
135<p>It is assumed that if you need to get data from non-standard places (e.g. from template engine or database), you
136should implement <span class="interface_name">Fetcher</span> in your own class.</p>
137
138<p>
139Note that the <b>get_data</b> method returns the <b>FetchedData</b> object (or one of its descendants) instead of
140HTML string!
141</p>
142</div>
143
144<img src="UML/Fetchers.PNG"/>
145
146<dl>
147<dt>get_data($data_id)</dt>
148<dd>
149Fetches the URL and returns page content and supplementary information.
150<ul>
151<li>$data_id &ndash; URI identifying the page location</li>
152</ul>
153</dd>
154
155<dt>get_base_url()</dt>
156<dd>Returns URL to be used as the base url when resolving relative links</dd>
157</dl>
158
159<div class="class">
160<b>FetcherURL</b> reads remote HTML page via HTTP or HTTPS.
161</div>
162
163<div class="class">
164<b>FetcherLocalFile</b> reads local file; in this case $data_id should contain path to the file to be read.
165</div>
166
167<div class="interface">
168<B>DataFilter</b> interface describes the filters modifying the raw input data.
169The main purpose of these filters is to fix the raw data so that it can be
170processed by parser without errors.
171</div>
172
173<img src="UML/Data_filters.PNG"/>
174
175<dl>
176<dt>process($data)</dt>
177<dd>
178Processes the FetchedData object and returns another FetchedData object with (probably) modified content
179<ul>
180<li>$data &ndash; FetchedData object</li>
181</ul>
182</dd>
183</dl>
184
185<div class="class">
186<b>DataFilterDoctype</b> tries to detect the mode this document should be rendered in (HTML, XHTML, QUIRKS).
187</div>
188
189<div class="class">
190<b>DataFilterHTML2XHTML</b>
191The precise description of this filter actions are beyond the scope of this
192document. In general, it makes the input document a wellformed XML document
193(possibly throwing out invalid parts, by the way). Note that it is achieved
194by extensive use of regular expressions; no XML/HTML parsers involved
195in conversion at this stage.
196</div>
197
198<div class="class">
199<b>DataFilterXHTML2XHTML</b> does some additional XHTML processing required for the
200script; for example, it removes comments, SCRIPT tags and does some other steps simplifying
201document processing.
202</div>
203
204<div class="class">
205<b>DataFilterUTF8</b> converts content from the source encoding to UTF-8. It is a good idea
206to use this filter if you're not limited by ASCII encoding.
207</div>
208
209<div class="interface">
210<b>Parser</b> interface provides a method of building the DOM tree from the
211filtered data.
212</div>
213
214<img src="UML/Parsers.PNG"/>
215
216<dl>
217<dt>process($data)</dt>
218<dd>
219Processes the FetchedData object and returns the document tree (somewhat similar to DOM) object.
220<ul>
221<li>$data &ndash; FetchedData object</li>
222</ul>
223</dd>
224</dl>
225
226<div class="class">
227<b>ParserXHTML</b>
228</div>
229
230<div class="interface">
231<b>PreTreeFilter</b> interface describes a procedure of document tree transformation executed before
232the layout engine starts.
233</div>
234
235<img src="UML/Pre_filters.PNG"/>
236
237<dl>
238<dt>process($data)</dt>
239<dd>
240Make some modifications in document tree (in-place) before the layout engine have been run.
241<ul>
242<li>$data &ndash; Document tree object</li>
243</ul>
244</dd>
245</dl>
246
247<div class="class" id="filter-pre-html2ps-fields">
248<b>PreTreeFilterHTML2PSFields</b> handles the processing
249of special fields (such a date, page count, page number, etc.).
250</div>
251
252<div class="class">
253<b>PreTreeFilterHeaderFooter</b> adds script-generated header and footer to the document tree.
254</div>
255
256<div class="interface">
257<b>LayoutEngine</b> interface of a class processing
258of the document tree and calculating positions of page elements. In theory, different implementations
259of this interface will allow us to use &quot;lightweight&quot; layout engines in case we do
260not need full HTML/CSS support.
261</div>
262
263<img src="UML/Layout_engines.PNG"/>
264
265<dl>
266<dt>process($data)</dt>
267<dd>
268Runs the layout process (document tree object is modified in-place).
269<ul>
270<li>$data &ndash; Document tree object</li>
271</ul>
272</dd>
273</dl>
274
275<div class="class">
276<b>LayoutEngineDefault</b> - a standard layout engine HTML2PS uses.
277</div>
278
279<div class="interface">
280<b>PostTreeFilter</b> interface describes a procedure of document tree transformation executed after
281the layout engine completes.
282</div>
283
284<img src="UML/Post_filters.PNG"/>
285
286<dl>
287<dt>process($data)</dt>
288<dd>
289Apply some changes to document tree (in-place) after the layout engine have been run.
290<ul>
291<li>$data &ndash; document tree object</li>
292</ul>
293</dd>
294</dl>
295
296<div class="interface"
297<b>OutputDriver</b> interface contains device-specific functions - drawing, movement, fonts selection, etc.
298In general, description of this interface is beyond the scope of this document, as users are not intended
299to implement this interface themselves. Instead, they would use pre-defined output drivers described below.
300</div>
301
302<img src="UML/Output_drivers.PNG"/>
303
304<div class="class">
305<b>OutputDriverPDFLIB</b> outputs PDF using PDFLIB.
306</div>
307
308<div class="class">
309<b>OutputDriverFPDF</b> outputs PDF using FPDF
310</div>
311
312<div class="class">
313<b>OutputDriverFastPS</b> handles Postscript Level 3 output.
314</div>
315
316<div class="class">
317<b>OutputDriverFastPSLevel2</b> handles Postscript Level 2 output.
318</div>
319
320<div class="interface">
321<b>OutputFilter</b> interface describes the filter applied to generated PS or PDF file.
322</div>
323
324<img src="UML/Output_filters.PNG"/>
325
326<div class="class">
327<b>OutputFilterPS2PDF</b> runs the PS2PDF utitity on the generated file.
328</div>
329
330<div class="class">
331<b>OutputFilterGZIP</b> compresses generated file using ZLIB.
332</div>
333
334<div class="interface">
335<b>Destination</b> interface describes the &quot;channel&quot; object which determines where the final output file
336should be placed.
337</div>
338
339<img src="UML/Destinations.PNG"/>
340
341<div class="class">
342<b>DestinationBrowser</b> outputs the generated file directly to the browser.
343</div>
344
345<div class="class">
346<b>DestinationDownload</b> outputs the generated file directly to the browser.
347Unlike <b>DestinationBrowser</b>, this class send headers preventing the file from being opened directly
348in the browser window.
349</div>
350
351<div class="class">
352<b>DestinationFile</b> saves generated file on the server side.
353</div>
354
355<h2>Implementing your own fetcher class</h2>
356<p>
357Sometimes you may need to convert HTML code taken from database or from other non-standard sources.
358In this case you should implement <b>Fetcher</b> interface yourself, returning the string to be converted
359from the <span class="method_name">get_data</span> method. Additional parameters (like database connection settings,
360template variables, etc) may be specified either as globals (not recommended, though), passed as a parameters
361to constructor of fetcher object or as $dataId parameter of <span class="method_name">get_data</span> method.
362</p>
363<p>
364Keep in mind that if you're including files from your HTML code (e.g. stylesheets or images), you should either
365return null from your fetcher for URL of these files, or handle them yourself. Unless you do it,
366these files will not be available.
367</p>
368
369<pre>
370class MyFetcherLocalFile extends Fetcher {
371  var $_content;
372
373  function MyFetcherLocalFile($file) {
374    $this->_content = file_get_contents($file);
375  }
376
377  function get_data($dummy1) {
378    return new FetchedDataURL($this->_content, array(), "");
379  }
380
381  function get_base_url() {
382    return "";
383  }
384}
385</pre>
386
387Also see <tt>sample.simplest.from.file.php</tt> and <tt>sample.simples.from.memory.php</tt> files.
388
389<h1>Class dependencies</h1>
390The pipeline object contains the following:
391<ul>
392<li>one or more objects implementing <b>Fetcher</b> interface;</li>
393<li>zero or more objects implementing <b>DataFilter</b> interface;</li>
394<li>one object implementing <b>Parser</b> interface;</li>
395<li>zero or more objects implementing <b>PreTreeFilter</b> interface;</li>
396<li>one object implementing <b>LayoutEngine</b> interface;</li>
397<li>zero or more objects implementing <b>PostTreeFilter</b> interface;</li>
398<li>one object implementing <b>OutputDriver</b> interface;</li>
399<li>one object implementing <b>Destination</b> interface;</li>
400</ul>
401
402No other dependencies between class in interfaces (except &quot;implements&quot;).
403
404Note that order of filters is important; imagine you're using some king of tree filter which adds header block
405containing HTML2PS-specific fields. In this case you must add this filter before PostTreeFilterHTML2PSFields, or
406you'll get raw field codes in generated output.
407
408</body>
409</html>