We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

PDF to Docx conversion

Hello forum,

I'm looking for an easy way to convert a Syncfusion.PdfLoadedDocument into a WordDocument.
All examples show how to convert into PDF documents but non of them vice versa.

Any ideas how to achieve this?

Best regards,
Sascha

3 Replies

SL Sowmiya Loganathan Syncfusion Team October 16, 2019 01:13 PM UTC

Hi Sascha, 

Thank you for contacting Syncfusion support. 

At present we do have direct support to convert PDF to Word document. However as a workaround we can achieve your requirement by exporting PDF pages as image then add that images to Word Document using PDF and DocIO library. Please refer the below KB link for your reference, 

Please let us know if you need any further assistance on this. 

Regards, 
Sowmiya Loganathan  
 



SN Sascha Nebel October 16, 2019 01:34 PM UTC

"...by exporting PDF pages as image then add that images to Word Document..."

Thank you for fast response!
The approach you describe is using images. This does not work on our site. We are using Syncfusion libraries currently as a pre-processing step to generate documents that can be read and interpreted by an API to extract data from the results. So using images would lead us to use OCR on top. And that is something we would like to avoid.

Meanwhile I was trying out a different approach:
I found the OPX example (https://www.syncfusion.com/products/opx/xpdf) where you converted a PDF into HTML. My idea is, that it could be a workaround to use this aproach and finally generate a WordDocument from the HTML output.
Unfortunately there is no possibility to use streams or byte arrays using the XPDF lib with the wrapper you provide. It can only take paths. Furthermore it's a not ideal that there is no nuget package available so I had to grab the DLLs from the example manually.

So there is still no solution to this issue.
Help is appreciated!

Sascha


SL Sowmiya Loganathan Syncfusion Team October 17, 2019 04:18 PM UTC

Hi Sascha, 

We can convert PDF to HTML using XPDF (https://www.syncfusion.com/products/opx/xpdf) and then converts the resultant HTML file into Word document using DocIO library. But resultant HTML file from PDF to HTML conversion is not a well formatted HTML file since DocIO library supports only the HTML files that meets the validation either against XHTML 1.0 strict or XHTML 1.0 Transitional schema.  

Please refer the below documentation link for more details, 

Note: If you load the non-formatted HTML files to Word document, it throws the error. So you can convert that HTML file to well formatted HTML file then perform HTML to word.  

Please let us know if you have any concerns on this. 

Regards, 
Sowmiya Loganathan 


Loader.
Up arrow icon