PDF To HTML

Hi,

I am developing an application using ASP.NET MVC framework. The goal is to extract tables from a PDF file. My approach is to convert the PDF file into HTML file, and parse though the HTML structure using HTML tags. I have found this documentation on the Syncfusion website for converting PDF to HTML https://www.syncfusion.com/products/opx/xpdf . I am having difficulties finding the required assemblies Syncfusion.PdfToHtmlConverter.OPX and Syncfusion.PdfToHtmlWrapper using NuGet Package Manager. Any help would be appreciated. Thank you

5 Replies 1 reply marked as answer

SL Sowmiya Loganathan Syncfusion Team July 16, 2020 09:13 AM UTC

Hi Kamal,   
  
Thank you for contacting Syncfusion support.   
  
We have analyzed your requirement. We did not include the assemblies (Syncfusion.PdfToHtmlConverter.OPX.dll and Syncfusion.PdfToHtmlWrapper.dll) for PDF to HTML conversion in nuget.org. So kindly get the assemblies from sample in the below link,   

Please find the assemblies (Syncfusion.PdfToHtmlConverter.OPX.dll and Syncfusion.PdfToHtmlWrapper.dll) download link from below,   
  
Kindly try with the above assemblies and let us if you need any further assistance on this.   
  
Regards,  
Sowmiya Loganathan  
  




KQ Kamal Qureshi July 16, 2020 01:50 PM UTC

Hi Sowmiya,

Thank-you for your response. I have added the assemblies and still having difficulty compiling code, as its causing errors. I am trying to run the example solution that Syncfusion has provided here https://www.syncfusion.com/products/opx/xpdf but it gives errors.

I keep getting an error at this line stating, Argument1 cannot be converted from 'string' to 'byte[]'
PdfLoadedDocument ldoc = new PdfLoadedDocument(txtImageFile.Text);

Refer to code below: 

//Initializing PdfToHtmlConverter
PdfToHtmlConverter converter = new PdfToHtmlConverter();
//Initializing and applying settings
PdfToHtmlConverterSettings setting = new PdfToHtmlConverterSettings();
setting.IsFrame = false;
setting.AbsolutePositioning = false;
converter.Settings = setting;
//Loading the input PDF document.
PdfLoadedDocument ldoc = new PdfLoadedDocument(txtImageFile.Text);
//Converting PDF to HTML
converter.Convert(inputPath, outputPath, ldoc.Pages.Count);
ldoc.Close(true);


SL Sowmiya Loganathan Syncfusion Team July 17, 2020 09:47 AM UTC

Hi Kamal,   
   
We have analyzed the reported issue and created the sample in ASP.NET MVC to convert PDF to HTML. It works fine in our end and please find the download link from below,    
   
   
 Based on the provided details, we suspect that you are trying to use the code in the ASP.NET Core platform. Please refer the below code snippet to load the PDF document in ASP.NET Core,    
   
//Load the PDF document   
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);   
   
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream);   
  
   
Please try the above solution at your end and let us know the result. If still you have to face the issue, please provide us the below details, it will helpful for us to provide the precise solution on this.    
   
  • Platform details
  • Sample / Code snippet
  • Input PDF document
Regards, 
Sowmiya Loganathan 


Marked as answer

MK Mohamed Kamal May 27, 2025 07:50 AM UTC

Hi, I've version 20.3.0.47 in 64 bit web application

I need those DLLs but compatible with my version and my architecture



AM Arumugam Muppidathi Syncfusion Team May 29, 2025 04:20 AM UTC

Hi Mohamed Kamal,


We have checked the reported issue on our end.  Upon further analysis, we were unable to replicate the reported issue and the PDF to HTML conversion is working as expected with the provided version on our end.  Our Syncfusion .NET PDF library supports conversion of PDF documents to HTML with the help of XPDF, an open-source viewer library for PDF documents. We have customized XPDF to enable PDF-to-HTML conversion, and our implementation supports preserving HTML content as flow layout with relative positioning.

 

Please find the link below to download the following assemblies

Syncfusion.PdfToHtmlConverter.OPX.

Syncfusion.PdfToHtmlWrapper.

 

http://www.syncfusion.com/downloads/support/directtrac/general/ze/PdfToHtmlOPX1940268788


These assemblies are kept inside the Assemblies folder in the project's location.  Please refer to the screenshot below.

 

image

 

However, we have attached the sample and documentation below for your reference

 

Documentation:  How to convert PDF to HTML using C#?

 

Note: We have only support to perform PDF to HTML conversion in .NET Framework applications.  We don't have support for .NET core application

 

Please try the above solution and let us know the result. Kindly get back to us if you need any further assistance.


Regards,
Arumugam M


Loader.
Up arrow icon