1<html> 2<head> 3<title>API description</title> 4<link rel="stylesheet" type="text/css" medial="all" title="Default" href="css/help.css"/> 5<style type="text/css"> 6div.note { 7 margin: 0.5em 0; 8} 9 10div.class { 11 margin: 0.5em 0 0.5em 2em; 12} 13 14div.interface { 15 margin: 1em 0 0.5em 0; 16 padding: 2px 5px; 17 background-color: #f0f0f0; 18} 19 20span.interface_name { 21 font-weight: bold; 22} 23 24span.method_name { 25 font-weight: bold; 26} 27</style> 28</head> 29<body> 30 31<h1>Beware: GLOBALS!</h1> 32<p> 33At the moment, the layout/conversion engine makes use of several global variables: 34<ul> 35<li>$g_config array (in particular, $g_config['renderforms'], $g_config['renderlinks'], $g_config['renderimages'], 36 $g_config['debugbox'], $g_config['mode'], $g_config['cssmedia'] and $g_config['draw_page_border'] 37 elements for all output methods and $g_config['ps2pdf'] and $g_config['transparency_workaround'] for 38 'fastps' output method.</li> 39<li>$g_px_scale</li> 40<li>$g_pt_scale</li> 41</ul> 42Please take this into account while using API. We're planning to get rid of these globals eventually. For a while, 43you may initialize these global with the code from samples above. 44</p> 45<p> 46Also, there's some global items script initializes itself: 47<ul> 48<li>$g_box_uid</li> 49<li>$g_colors</li> 50<li>$__g_css_manager</li> 51<li>$__g_css_handler_set</li> 52<li>$g_encoding_aliases</li> 53<li>$g_frame_level</li> 54<li>$g_font_resolver</li> 55<li>$g_font_resolver_pdf</li> 56<li>$g_html_entities</li> 57<li>$g_image_cache</li> 58<li>$g_last_assigned_font_id</li> 59<li>$g_manager_encodings</li> 60<li>$g_media</li> 61<li>$g_predefined_media</li> 62<li>$g_stylesheet_title</li> 63<li>$g_tag_attrs</li> 64<li>$g_unicode_glyphs</li> 65<li>$g_utf8_converters</li> 66</ul> 67There's no need to initialize or modify these variables; just don't accidentally overwrite them. Some of them 68are here for "historical" reasons and will be eventually removed. Some are here due lack of static class variables 69in older PHP versions. 70</p> 71 72<h1>Conversion pipeline</h1> 73<div> 74<b>PipelineFactory</b> is a simple factory class simplifying building of <b>Pipeline</b> instances; 75<b>create_default_pipeline()</b> will build a simple ready-to-run conversion pipeline. The usage of 76<b>PipelineFactory</b> is not required; you may create the <b>Pipeline</b> object and fill 77the appropriate fields manually. 78 79<pre class="code"> 80class PipelineFactory { 81 function create_default_pipeline(); 82} 83</pre> 84</div> 85 86<div> 87<b>Pipeline</b> class describe the process of conversion as a whole; it contains references to classes, described 88above and is responsible for calling them in correct order and error handling. 89<pre class="code"> 90class Pipeline { 91 var $fetchers; 92 var $data_filters; 93 var $parser; 94 var $pre_tree_filters; 95 var $layout_engine; 96 var $post_tree_filters; 97 var $output_driver; 98 var $output_filter; 99 var $destination; 100 101 function Pipeline(); 102 103 function configure($options); 104 function process($data_id, &$media); 105 function process_batch($data_id_array, &$media); 106 function error_message(); 107 108 function &get_dispatcher(); 109} 110</pre> 111</div> 112 113</div> 114 115<h1>Description of interfaces and classes</h1> 116 117<div class="note"> 118Almost all interfaces described below include 119<span class="method_name">error_message</span> method. 120It should return the user-readable description of 121the error. This description MAY contain HTML tags, but should remain 122readable in case tags are removed. 123</div> 124 125<div class="interface"> 126<p><span class="interface_name">Fetcher</span> interface provides a method of 127fetching the data required 128to build a document tree. Normally, classes implementing this interface would 129fetch an HTML/XHTML string from somewhere (e.g. from remove HTTP server, 130local file or database). Nevertheless, it MAY fetch ANY data provided that 131this data will be understood by parser. The pipeline object may contain 132several fetcher objects; in this case they're used one-by-one until 133one of them return non-null value.</p> 134 135<p>It is assumed that if you need to get data from non-standard places (e.g. from template engine or database), you 136should implement <span class="interface_name">Fetcher</span> in your own class.</p> 137 138<p> 139Note that the <b>get_data</b> method returns the <b>FetchedData</b> object (or one of its descendants) instead of 140HTML string! 141</p> 142</div> 143 144<img src="UML/Fetchers.PNG"/> 145 146<dl> 147<dt>get_data($data_id)</dt> 148<dd> 149Fetches the URL and returns page content and supplementary information. 150<ul> 151<li>$data_id – URI identifying the page location</li> 152</ul> 153</dd> 154 155<dt>get_base_url()</dt> 156<dd>Returns URL to be used as the base url when resolving relative links</dd> 157</dl> 158 159<div class="class"> 160<b>FetcherURL</b> reads remote HTML page via HTTP or HTTPS. 161</div> 162 163<div class="class"> 164<b>FetcherLocalFile</b> reads local file; in this case $data_id should contain path to the file to be read. 165</div> 166 167<div class="interface"> 168<B>DataFilter</b> interface describes the filters modifying the raw input data. 169The main purpose of these filters is to fix the raw data so that it can be 170processed by parser without errors. 171</div> 172 173<img src="UML/Data_filters.PNG"/> 174 175<dl> 176<dt>process($data)</dt> 177<dd> 178Processes the FetchedData object and returns another FetchedData object with (probably) modified content 179<ul> 180<li>$data – FetchedData object</li> 181</ul> 182</dd> 183</dl> 184 185<div class="class"> 186<b>DataFilterDoctype</b> tries to detect the mode this document should be rendered in (HTML, XHTML, QUIRKS). 187</div> 188 189<div class="class"> 190<b>DataFilterHTML2XHTML</b> 191The precise description of this filter actions are beyond the scope of this 192document. In general, it makes the input document a wellformed XML document 193(possibly throwing out invalid parts, by the way). Note that it is achieved 194by extensive use of regular expressions; no XML/HTML parsers involved 195in conversion at this stage. 196</div> 197 198<div class="class"> 199<b>DataFilterXHTML2XHTML</b> does some additional XHTML processing required for the 200script; for example, it removes comments, SCRIPT tags and does some other steps simplifying 201document processing. 202</div> 203 204<div class="class"> 205<b>DataFilterUTF8</b> converts content from the source encoding to UTF-8. It is a good idea 206to use this filter if you're not limited by ASCII encoding. 207</div> 208 209<div class="interface"> 210<b>Parser</b> interface provides a method of building the DOM tree from the 211filtered data. 212</div> 213 214<img src="UML/Parsers.PNG"/> 215 216<dl> 217<dt>process($data)</dt> 218<dd> 219Processes the FetchedData object and returns the document tree (somewhat similar to DOM) object. 220<ul> 221<li>$data – FetchedData object</li> 222</ul> 223</dd> 224</dl> 225 226<div class="class"> 227<b>ParserXHTML</b> 228</div> 229 230<div class="interface"> 231<b>PreTreeFilter</b> interface describes a procedure of document tree transformation executed before 232the layout engine starts. 233</div> 234 235<img src="UML/Pre_filters.PNG"/> 236 237<dl> 238<dt>process($data)</dt> 239<dd> 240Make some modifications in document tree (in-place) before the layout engine have been run. 241<ul> 242<li>$data – Document tree object</li> 243</ul> 244</dd> 245</dl> 246 247<div class="class" id="filter-pre-html2ps-fields"> 248<b>PreTreeFilterHTML2PSFields</b> handles the processing 249of special fields (such a date, page count, page number, etc.). 250</div> 251 252<div class="class"> 253<b>PreTreeFilterHeaderFooter</b> adds script-generated header and footer to the document tree. 254</div> 255 256<div class="interface"> 257<b>LayoutEngine</b> interface of a class processing 258of the document tree and calculating positions of page elements. In theory, different implementations 259of this interface will allow us to use "lightweight" layout engines in case we do 260not need full HTML/CSS support. 261</div> 262 263<img src="UML/Layout_engines.PNG"/> 264 265<dl> 266<dt>process($data)</dt> 267<dd> 268Runs the layout process (document tree object is modified in-place). 269<ul> 270<li>$data – Document tree object</li> 271</ul> 272</dd> 273</dl> 274 275<div class="class"> 276<b>LayoutEngineDefault</b> - a standard layout engine HTML2PS uses. 277</div> 278 279<div class="interface"> 280<b>PostTreeFilter</b> interface describes a procedure of document tree transformation executed after 281the layout engine completes. 282</div> 283 284<img src="UML/Post_filters.PNG"/> 285 286<dl> 287<dt>process($data)</dt> 288<dd> 289Apply some changes to document tree (in-place) after the layout engine have been run. 290<ul> 291<li>$data – document tree object</li> 292</ul> 293</dd> 294</dl> 295 296<div class="interface" 297<b>OutputDriver</b> interface contains device-specific functions - drawing, movement, fonts selection, etc. 298In general, description of this interface is beyond the scope of this document, as users are not intended 299to implement this interface themselves. Instead, they would use pre-defined output drivers described below. 300</div> 301 302<img src="UML/Output_drivers.PNG"/> 303 304<div class="class"> 305<b>OutputDriverPDFLIB</b> outputs PDF using PDFLIB. 306</div> 307 308<div class="class"> 309<b>OutputDriverFPDF</b> outputs PDF using FPDF 310</div> 311 312<div class="class"> 313<b>OutputDriverFastPS</b> handles Postscript Level 3 output. 314</div> 315 316<div class="class"> 317<b>OutputDriverFastPSLevel2</b> handles Postscript Level 2 output. 318</div> 319 320<div class="interface"> 321<b>OutputFilter</b> interface describes the filter applied to generated PS or PDF file. 322</div> 323 324<img src="UML/Output_filters.PNG"/> 325 326<div class="class"> 327<b>OutputFilterPS2PDF</b> runs the PS2PDF utitity on the generated file. 328</div> 329 330<div class="class"> 331<b>OutputFilterGZIP</b> compresses generated file using ZLIB. 332</div> 333 334<div class="interface"> 335<b>Destination</b> interface describes the "channel" object which determines where the final output file 336should be placed. 337</div> 338 339<img src="UML/Destinations.PNG"/> 340 341<div class="class"> 342<b>DestinationBrowser</b> outputs the generated file directly to the browser. 343</div> 344 345<div class="class"> 346<b>DestinationDownload</b> outputs the generated file directly to the browser. 347Unlike <b>DestinationBrowser</b>, this class send headers preventing the file from being opened directly 348in the browser window. 349</div> 350 351<div class="class"> 352<b>DestinationFile</b> saves generated file on the server side. 353</div> 354 355<h2>Implementing your own fetcher class</h2> 356<p> 357Sometimes you may need to convert HTML code taken from database or from other non-standard sources. 358In this case you should implement <b>Fetcher</b> interface yourself, returning the string to be converted 359from the <span class="method_name">get_data</span> method. Additional parameters (like database connection settings, 360template variables, etc) may be specified either as globals (not recommended, though), passed as a parameters 361to constructor of fetcher object or as $dataId parameter of <span class="method_name">get_data</span> method. 362</p> 363<p> 364Keep in mind that if you're including files from your HTML code (e.g. stylesheets or images), you should either 365return null from your fetcher for URL of these files, or handle them yourself. Unless you do it, 366these files will not be available. 367</p> 368 369<pre> 370class MyFetcherLocalFile extends Fetcher { 371 var $_content; 372 373 function MyFetcherLocalFile($file) { 374 $this->_content = file_get_contents($file); 375 } 376 377 function get_data($dummy1) { 378 return new FetchedDataURL($this->_content, array(), ""); 379 } 380 381 function get_base_url() { 382 return ""; 383 } 384} 385</pre> 386 387Also see <tt>sample.simplest.from.file.php</tt> and <tt>sample.simples.from.memory.php</tt> files. 388 389<h1>Class dependencies</h1> 390The pipeline object contains the following: 391<ul> 392<li>one or more objects implementing <b>Fetcher</b> interface;</li> 393<li>zero or more objects implementing <b>DataFilter</b> interface;</li> 394<li>one object implementing <b>Parser</b> interface;</li> 395<li>zero or more objects implementing <b>PreTreeFilter</b> interface;</li> 396<li>one object implementing <b>LayoutEngine</b> interface;</li> 397<li>zero or more objects implementing <b>PostTreeFilter</b> interface;</li> 398<li>one object implementing <b>OutputDriver</b> interface;</li> 399<li>one object implementing <b>Destination</b> interface;</li> 400</ul> 401 402No other dependencies between class in interfaces (except "implements"). 403 404Note that order of filters is important; imagine you're using some king of tree filter which adds header block 405containing HTML2PS-specific fields. In this case you must add this filter before PostTreeFilterHTML2PSFields, or 406you'll get raw field codes in generated output. 407 408</body> 409</html>