How to Convert Html to Docx

Hi,
I have a HTML file for conversion to Docx but it is not following strict XHTML rules and i want to still produce/Convert it to docx using Syncfusion.DocIO library.

What would be the settings to convert any HTML file (if it renders in browser) then it should produce the Docx.

And if possible please share a sample code of it. i have attached my sample code.

Thanks in anticipation


Attachment: WebApplication2_94373689.zip

3 Replies

MR Manikandan Ravichandran Syncfusion Team December 17, 2021 06:11 AM UTC

Hi Mudasir,

On further checking the reported problem, we have found that your input HTML file is in not well formatted (doesn’t having proper end tag for br element).
In Word library (DocIO) we use XML reader for parsing the content from input HTML. So, the input HTML should meet XML standard.

To check whether the HTML string is read by XML Reader or not, please use the following code example which is used in our product.

 
XmlDocument m_xmlDoc = new XmlDocument();
FileStream fileStream =
new FileStream(@"testalignment.html", FileMode.Open);
XmlReader reader = XmlReader.Create(fileStream);
m_xmlDoc.Load(reader);
reader.Close();
 

Please find the exception from XMLReader when loading the HTML document.


If the input HTML file not read by the XML reader, then those HTML cannot be processed with DocIO.

Please find the modified HTML content as below and also output Word document from the attachment.
 
<html>
<
body>
    This is the text for page #1.
   
<br style="page-break-before: always"/>
        Page #2...
   
<br style="page-break-before: always"/>
    <br />
        Page #3...
</body>
</
html> 

Regards,
Manikandan Ravichandran



MU Mudasir December 17, 2021 09:27 AM UTC

Thanks for response.
I want to ask you one more question. what is the purpose of XHTMLValidationType.None. i have set None but its still validate my html. i want to know that how i do this the library not validate my html code. in all type None,Strict,Transitional its through the error.

is there any option in this library how to successfully converted html to docx with not validate html.




Thanks



MR Manikandan Ravichandran Syncfusion Team December 20, 2021 09:43 AM UTC

Hi Mudasir,

The XHTMLValidationType.None does not perform any schema validation, but the given HTML content should meet XHTML 1.0 format. As mentioned earlier, we are using XML reader for parsing the HTML content from document. So, the input HTML should meet XML standard. If the input HTML file not read by the XML reader, then those HTML cannot be processed with DocIO.

Please refer the below link to know more about XHTML validation.
https://help.syncfusion.com/file-formats/docio/html#xhtml-validation-types

Regards,
Manikandan Ravichandran
 


Loader.
Up arrow icon