We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. (Last updated on: June 24, 2019).
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback

PDF to Docx conversion

Thread ID:

Created:

Updated:

Platform:

Replies:

148316 Oct 15,2019 12:16 PM UTC Oct 17,2019 04:18 PM UTC ASP.NET MVC 3
loading
Tags: PDF
Sascha Nebel
Asked On October 15, 2019 12:16 PM UTC

Hello forum,

I'm looking for an easy way to convert a Syncfusion.PdfLoadedDocument into a WordDocument.
All examples show how to convert into PDF documents but non of them vice versa.

Any ideas how to achieve this?

Best regards,
Sascha

Sowmiya Loganathan [Syncfusion]
Replied On October 16, 2019 01:13 PM UTC

Hi Sascha, 

Thank you for contacting Syncfusion support. 

At present we do have direct support to convert PDF to Word document. However as a workaround we can achieve your requirement by exporting PDF pages as image then add that images to Word Document using PDF and DocIO library. Please refer the below KB link for your reference, 

Please let us know if you need any further assistance on this. 

Regards, 
Sowmiya Loganathan  
 


Sascha Nebel
Replied On October 16, 2019 01:34 PM UTC

"...by exporting PDF pages as image then add that images to Word Document..."

Thank you for fast response!
The approach you describe is using images. This does not work on our site. We are using Syncfusion libraries currently as a pre-processing step to generate documents that can be read and interpreted by an API to extract data from the results. So using images would lead us to use OCR on top. And that is something we would like to avoid.

Meanwhile I was trying out a different approach:
I found the OPX example (https://www.syncfusion.com/products/opx/xpdf) where you converted a PDF into HTML. My idea is, that it could be a workaround to use this aproach and finally generate a WordDocument from the HTML output.
Unfortunately there is no possibility to use streams or byte arrays using the XPDF lib with the wrapper you provide. It can only take paths. Furthermore it's a not ideal that there is no nuget package available so I had to grab the DLLs from the example manually.

So there is still no solution to this issue.
Help is appreciated!

Sascha

Sowmiya Loganathan [Syncfusion]
Replied On October 17, 2019 04:18 PM UTC

Hi Sascha, 

We can convert PDF to HTML using XPDF (https://www.syncfusion.com/products/opx/xpdf) and then converts the resultant HTML file into Word document using DocIO library. But resultant HTML file from PDF to HTML conversion is not a well formatted HTML file since DocIO library supports only the HTML files that meets the validation either against XHTML 1.0 strict or XHTML 1.0 Transitional schema.  

Please refer the below documentation link for more details, 

Note: If you load the non-formatted HTML files to Word document, it throws the error. So you can convert that HTML file to well formatted HTML file then perform HTML to word.  

Please let us know if you have any concerns on this. 

Regards, 
Sowmiya Loganathan 


CONFIRMATION

This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

Please sign in to access our forum

This page will automatically be redirected to the sign-in page in 10 seconds.

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon

Live Chat Icon For mobile
Live Chat Icon