HTML to PDF Conversion Using the WebKit Rendering Engine

Syncfusion Essential PDF now supports HTML to PDF conversion by using the WebKit rendering engine in addition to the existing Internet Explorer and Gecko renderers. This converter works well on both x86 and x64 environments and can be easily integrated into any application on .NET platforms such as Windows Forms, WPF, ASP.NET, and ASP. NET MVC to convert URLs, HTML string, images, and SVG to PDF. The WebKit rendering engine provides better support to render modern web standards so that the output from the WebKit engine is more accurate in some cases than the other available options.

Limitations in the IE Rendering Engine

Until now, Syncfusion’s HTML to PDF converter relied on Microsoft’s MSHTML library to do the conversion from HTML to PDF. The actual conversion happens in two steps. The first step is to convert HTML into a metafile. The metafile is then rendered to PDF. The main advantage of this kind of conversion is that the text rendered remains searchable in PDF. This is an extremely important requirement for a lot of our customers.

However, with version 9 of Internet Explorer, Microsoft started using hardware acceleration to produce bitmap images instead of metafiles while retrieving snapshots, completely removing the ability to render selectable or searchable text within PDF. Users can work around the problem by making some registry changes, but may not be satisfied with the result, so a better alternative was needed.

After considerable research, a new converter based on the WebKit renderer was created. The Webkit rendered document contains vector graphics instead of scalar images. This reduces file size and allows users to perform various operations such as text search, selection, and clipboard copy. Apart from overcoming the limitations in the Internet Explorer rendering engine, the new WebKit render also provides better support to render HTML5, CSS3, and SVG content.

Rendering Engines Comparison

The following table shows the various rendering engines available and the features they support.

Table 1: Rendering Engines Comparison

Feature

IE Renderer

WebKit Renderer

Gecko Renderer

Convert URLs

ü ü ü

Convert from string

ü ü ü

HTML tags

ü

ü

ü

Images

ü

ü

û

Hyperlinks

ü

ü

û

CSS

ü

ü

û

Javascript

ü

ü

û

Activex plugins

ü

ü

û

HTML 5

ü

ü

û

Page breaks

ü

ü

ü

Vector Graphics (Selectable/searchable text)

HTML 5 pages are rendered as bitmap.

ü

ü

Convert to Image

ü

û

û

Handling image and text split across pages

ü

ü

û

Pdf A1-B

ü

û

û

Tagged PDF

ü

û

û

Page settings

ü

ü

ü

Header and Footer

ü

ü

ü

Windows Authentication

ü ü ü

Using the new WebKit powered conversion engine

The following step-by-step procedure will explain how to convert a webpage by using WebKit renderer.

1. Create an instance of the PdfDocument class to create a PDF document. The following code sample shows how:

 

       //Create a PDF document

       PdfDocument document = new PdfDocument();

 

2. Set the page dimensions and margins for the document as shown in the following code sample by using the PageSettings property. Remember to set page dimensions before you add a page on to the document.

 

//Set page margins and dimensions.

document.PageSettings.SetMargins((float)this.nudMargin.Value);

document.PageSettings.Size = PdfPageSize.A4;

 

3. The following code sample shows how to add a new page to the document from where the rendering should begin:

 

//Add a page to the document

page = document.Pages.Add();

 

4. Get the ClientSize (actual available area for rendering) of the page and convert it into pixels by using the PdfUnitConverter class. This is done because all the measurements in WebKit are done in pixels, whereas the PDF measures in points. The following code sample demonstrates this:

 

//Get the width and height of the page – client size in pixels

 

PdfUnitConvertor convertor = new PdfUnitConvertor();

 

float  width = convertor.ConvertToPixels(document.PageSettings.Width, PdfGraphicsUnit.Point);

float height = convertor.ConvertToPixels(document.PageSettings.Height, PdfGraphicsUnit.Point);

 

5. Create a metafile layout format by using the PdfMetafileLayoutFormat class. This format provides various options like spanning or paginating the rendering to multiple pages, handling text and image split across multiple pages, and more. The following code sample illustrates this:

 

// Create layout format for Metafile.

PdfMetafileLayoutFormat metafileFormat = new PdfMetafileLayoutFormat();

metafileFormat.Break = PdfLayoutBreakType.FitPage;

metafileFormat.Layout = PdfLayoutType.Paginate;

metafileFormat.SplitTextLines = false;

metafileFormat.SplitImages = false;

 

6. To perform the conversion:

     a. Create an instance of the WebKitHtmlConverter class to create an HTML converter.

     b. The most important part is referring the Qt assemblies as mentioned earlier. The absolute or relative path of the QTBinaries is set to the WebKitPath property of the WebKitHtmlConverter.

     c. Use the EnableJavascript and EnableHyperlinks properties in WebKitHtmlConverter class to enable Javascript and Hyperlink in the rendering respectively.

     d. The WebKitHtmlConverter also provides various other properties to add an AdditionalDelay, set the ViewPortSize, and so on.

     e. Use the Convert() method of the WebKitHtmlConverter class to perform the conversion and use the Render() method of the HtmlToPdfResult class to draw it on to the PDF page.

The following code sample demonstrates these steps:

 

            //Initialize WebKit rendering Engine

            WebKitHtmlConverter renderer = new WebKitHtmlConverter();

 

            renderer.EnableHyperlinks = true;

            renderer.EnableJavaScript = true;

            //Need to provide Qt WebKit Binaries Path

            renderer.WebKitPath = @"QTBinaries";

            //Convert the URL/HTML string by providing the required width and          height.

            HtmlToPdfResult result = renderer.Convert(@"http://www.google.com", (int)width, (int)height);

 

            //Create a layout format to enable pagination

            PdfMetafileLayoutFormat format = new PdfMetafileLayoutFormat();

            //Fit the HTML conversion to the page

            format.Break = PdfLayoutBreakType.FitPage;

            //Enable document pagination

            format.Layout = PdfLayoutType.Paginate;

            //Disable text line break across pages

            format.SplitTextLines = true;

            //Disable image split across pages

            format.SplitImages = true;

            //Render the HTML content on the PDF page

            result.Render(page, format);

 

7. Save the PDF document to disk and dispose the object. You can use the following code sample to save the PDF document:

 

// Save and close the document.

document.Save("Sample.pdf");

document.Close(true);

 

The HTMLConverter class has several properties similar to all other three rendering engines. However, the following two properties are specific only for the WebKit rendering engine, as show in Table 2.

Table 2: Webkit Rendering Engine Properties

Property

Type

Default

Description

WebKitPath

System.String

string.Empty

Gets or Sets the WebKit binaries path. User can provide Absolute or relative path.

WebKitViewPort

System.Drawing.Size

System.Drawing.Size(1280, 1024);

Gets or Sets the WebKit view port size.

Prerequisites
As our QT Webkit Converter requires msvcp100.dll, msvcr100.dll for converting webpages to PDF, these assemblies should be available in the machine. For 64-bit machines, they should be placed in C:\Windows\SysWOW64 and for 32-bit machines, they should be placed in C:\Windows\System32.

Webkit Converter may create a blank-page PDF under following cases:

  • If the webpage (HTML) is not available or accessible.
  • If msvcp.dll and msvcr.dll are not present in the SysWOW64 location.
  • If any QT binary is not present in the mentioned location.

You can download a working sample from the following link.

www.syncfusion.com/downloads/support/directtrac/130145/WebKitHtmlToPDF727676672.zip

You can download the assemblies from the following link.

www.syncfusion.com/downloads/support/directtrac/general/WebKitHtmlConverter-497589520.zip

Comments (1) -

  • Oleg Gnetuia
    Aug 26, 2015

    Hi,

    We got blank sample.pdf in your sample. We had correct temp \4b544231-fd9b-4788-845c-5b29991da0e0.pdf in result.WebKitFilePath (e.g. C:\Users\<UserName>\AppData\Local\Temp\4b544231-fd9b-4788-845c-5b29991da0e0) but sample.pdf appeared empty.

    We double checked that we have no any of this cases:
        If the webpage (HTML) is not available or accessible.
        If msvcp.dll and msvcr.dll are not present in the SysWOW64 location.
        If any QT binary is not present in the mentioned location.

    Can you please explain why do we have blank document in result?

    Thank you.

Loading