There are cases when a web application may need to create a custom PDF file. There are several methods for accomplishing this as well. Recently, I worked on a project that involved an employment application. For this project, I needed to provide a HTML preview of the completed application as well as a PDF. Since I was going to be formatting the application with HTML anyway, I wanted to find a method to turn this into a PDF. The solution I found was html2pdf, a series of scripts for turning an HTML page into a PDF using PHP.
Getting Started: Download and Install html2pdf
You can find html2pdf at http://www.tufat.com/s_html2ps_html2pdf.htm. You’ll need to download the script and unzip it. To install the script, simply copy the /html2pdf folder into a folder in your web site somewhere. Make sure you remove the samples and demo folders. These are provided to show how to use html2pdf but shouldn’t be live on your site.
To use html2pdf, you will need either Ghostscript or PDFLib installed on your server. PDFLib creates higher quality and faster rendering PDF files but it is a commercial library for which you must purchase a license. Ghostscript will for most purposes generate acceptable results. When using Ghostscript, your HTML will be first converted to Postscript and then to PDF. As a result, the script does use a good deal of resources. Check your php.ini file because you’ll need to allow 32-64MB for the PHP memory limit. You’ll also want to set your maximum script execution time to 2-3 minutes. The default is usually 30 seconds. My project initially worked fine with that but once it moved into production and more PDFs were being generated, it started timing out.
Other requirements include PHP 4.1 or higher. html2pdf will work just fine with PHP 5. In fact, my project was tested with PHP 5.3 and it worked beautifully. Your PHP will need to have GD and DOM XML extensions. The Zlib extension is also highly recommended.
Fortunately, there’s a minimal amount of configuration needed to use html2pdf. The settings you will need to adjust are all located in the config.inc.php file. The most relevant items are the path to your Ghostscript executable (hint: on most Linux distros it’s /usr/bin/gs) and the path to your Type 1 fonts repository (again on most Linux systems: /usr/share/fonts/type1/gsfonts). If you are using PDFLib, the important configurations include the location of the library, your license key and the location of PDFLib’s configuration file.
Optional configuration options include the default filename for generated PDFs. You can also set the default encoding and the user agent string sent by html2pdf’s fetcher class when it requests the HTML to be converted.
Creating a PDF
html2pdf will parse your linked stylesheets and render the HTML to create a PDF. As a result, it usually retrieves the page to be converted using its fetcher class. While you can write custom fetcher classes to handle different scenarios such as dealing with authentication, we’re just going to look at the basic method of loading HTML page and generating a PDF from it.
The first step is to include the required files from html2pdf. Here’s the code to include the necessary files (assuming you installed html2pdf in a subdirectory called /html2pdf):
The next step is to parse the configuration file for html2ps, which converts the HTML to an interim postscript file so that Ghostscript can convert it to a PDF. You do it with this command:
html2pdf requires a number of global variables to hold configuration information. The first of these is an array which holds a number of configuration options on how the HTML should be rendered. Here’s my code:
global $g_config; $g_config = array( 'cssmedia' => 'screen', 'renderimages' => true, 'renderforms' => false, 'renderlinks' => true, 'mode' => 'html', 'debugbox' => false, 'draw_page_border' => false );
A few of these options merit some additional discussion. You can tell html2pdf which CSS files to use by setting cssmedia. If you defined special CSS files for print media, setting cssmedia to “print” will cause those stylesheets to be used in place of the “screen” stylesheets. For my project, I wanted to capture the same formatting at the HTML preview so I set cssmedia to “screen”.
If you’re experiencing some problems with your PDF output, you can set debugbox to true. This setting causes each box area to be outlined in the PDF output. This allows you to debug how the PDF is being generated and to see how your HTML block elements are being converted to PDF.
While it might seem that renderfields would have something to do with HTML form fields, it doesn’t. The renderfields setting affects whether special fields like ##PAGE## will be interpreted by html2pdf.
The renderforms setting deals with interactive PDF forms. Likewise, renderlinks determines whether or not the internal and external hyperlinks are rendered in the PDF.
The next step in preparing to generate a PDF is to setup the media size and margins. Again, here is my code:
$media = Media::predefined('Letter'); $media->set_landscape(false); $media->set_margins(array( 'left' => 15, 'right' => 15, 'top' => 25, 'bottom' => 0 )); $media->set_pixels(960);
I’m using a predefined media size, in this case “Letter”. The library contains a series of predefined media sizes such as “Letter”, “A4” and so on. The next setting determines whether or not I want to use landscape orientation. For my project, portrait orientation was needed so I set this to false. The page margins are set using an array. The numeric values here are set in millimeters. 15 millimeters is approximately half an inch and 25 millimeters is about 1 inch. For my project, I left the bottom margin as 0 because I had defined page breaks and setting a bottom margin caused the output to break in an unpredictable place. Finally, you need to tell html2pdf how to scale its output. You do this by setting the maximum width of the HTML page in pixels. This value will then be used to calculate the size for the PDF. Since I use the Blueprint CSS framework, my page is constrained to 960 pixels.
Next, we calculate our scale factors. Here’s the code:
global $g_px_scale; $g_px_scale = mm2pt($media->width() - $media->margins['left'] - $media>margins['right'] ) / $media->pixels; global $g_pt_scale; $g_pt_scale = $g_px_scale * 1.43;
The first scale we calculate is the pixel scale. To do this we take the width of our media and subtract the right and left margins. Then we divide it by the width of our HTML in pixels. We then use this number and multiply by 1.43 to get our point scale.
Now we’re ready to do our magic. The pipeline class will take care of the conversion for us. Here’s a snippet of code:
$pipeline = PipelineFactory::create_default_pipeline("", ""); $pipeline->pre_tree_filters = new PreTreeFilterHTML2PSFields(); $pipeline->destination = new DestinationDownload($filename); $pipeline>process($url, $media);
The first two lines of this code are standard code taken from the documentation for html2pdf. It sets up a new instance of the default pipeline class and tells it how to filter the HTML for conversion. The lines that interest us are the next too. The 3rd line is optional. Here we’re telling the script to send the file as an HTTP attachment so that the browser prompts the user to download it. We pass the filename that we want our downloaded PDF to have as a parameter. Without this line, the file will be given the default filename defined in config.inc.php (“unnamed” by default). The next line does the actual generation of the PDF. You pass the process method two parameters – the url that you want to convert and the earlier defined $media object that controls the page size, margins and scale.
That’s pretty much all there is to turning a HTML page into a PDF. There are a couple small items I’ve found useful. For my project, I needed to place some page breaks in specific places. You can do this by adding a HTML comment to the page being converted:
This will cause the html2pdf script to insert a page break. There are also a number of special fields that can be added to your HTML that will be rendered in the final PDF if you specify renderfields = true. These include fields for the current page number, the total number of pages, the timestamp and more. Take a look at the html2pdf documentation for more information on these fields.
Finally, if you need to do something special such as render HTML from a file into a PDF or authenticate to retrieve the page to be rendered, you will need to create a custom fetcher class to handle this. You can learn more about this process from the documentation.
Creating dynamic PDF files is a pretty common web development task. If you have to create HTML and PDF versions of the same page, html2pdf can greatly simplify that process by using your existing HTML to generate a PDF. The basic usage is pretty straightforward but the library is extensible so you can create custom classes to handle authentication or HTML stored in a files. You’ll be surprised how simple it is to make a PDF.
Help us spread the word!
If you liked this article, consider enrolling in one of these related courses:
|Jun 05-06||Web Development with PHP/MySQL|
|Aug 07-08||Web Development with PHP/MySQL|
|- Classroom - Online|