word to pdf with invalid xhtml breaks word paging

see the program attached: https://gist.github.com/cyptus/41697ffda50ab18d79db3d555d8cd5e6#file-program-cs
it tries to insert a html content with mail merge.
if the html content is invalid xhtml, it throws an exception which then is handled by a fallback.
the try to insert a invalid xhtml will have effects on the generated PDF document, as the word paging is broken in this output now:

Screenshot of PDF document with broken paging:


the InsertXHtml method should not have any effects if invalid XHTML is provided and it does fail.

the break <br> can be replaced with <br /> (valid xhtml), and the program works fine.


Attachment: word2pdfpagingbug_17a3d6c1.zip

16 Replies 1 reply marked as answer

HC Hemalatha Chiranjeevulu Syncfusion Team November 16, 2020 11:45 AM UTC

Hi Pascal,

Thanks for contacting Syncfusion support.

We can reproduce the reported issue in our end, and we suspect it to be a defect. We will validate this issue and update you with more details on 18th November 2020.

Please let us know if you have any other questions.

Regards,
Hemalatha C



HC Hemalatha Chiranjeevulu Syncfusion Team November 18, 2020 07:16 PM UTC

Hi Pascal,

Sorry for the inconvenience.

We are facing some complexities while validating the reported issue. Currently, we are validating on this issue with high priority and will update you with more details on 20th November 2020.

Please let us know if you have any other questions.

Regards,
Hemalatha C



HC Hemalatha Chiranjeevulu Syncfusion Team November 20, 2020 09:32 PM UTC

Hi Pascal,

Thank you for your patience.

On further analyzing the given details, we have found that the given HTML string is not a well formatted and you are trying to insert this HTML string. It internally rises problem when appending the not well formatted HTML string. Due to this, it skips to update the page field. So, we have validated the HTML string and then insert the HTML content to the paragraph to update the page field. We suggest you to use the below modified code to meet your requirement.


 
private static void MailMerge_MergeField(object sender, MergeFieldEventArgs args)
{
var paragraph = args.CurrentMergeField.OwnerParagraph;
var paraIndex = paragraph.OwnerTextBody.ChildEntities.IndexOf(paragraph);
var fieldIndex = paragraph.ChildEntities.IndexOf(args.CurrentMergeField);
 
bool isValidHtml = paragraph.Document.LastSection.Body.IsValidXHTML(args.FieldValue.ToString(),paragraph.Document.XHTMLValidateOption);
if (isValidHtml)
{
paragraph.OwnerTextBody.InsertXHTML(args.FieldValue.ToString(), paraIndex, fieldIndex);
args.Text =
string.Empty;
}
else
{
args.Text =
"insert failed";
}
}

 

Please let us know if you have any other questions.

Regards,
Hemalatha C
 



PS Pascal Seifert November 23, 2020 07:23 AM UTC

thanks for answer.
we already tried to validate the XHTML before:

paragraph.Document.LastSection.Body.IsValidXHTML("<br>", XHTMLValidationType.None)

will return true, but indeed its not a valid XHTML and will fail with InsertXHTML call.
So the issue remains the same, as the validation method does not validate in the same matter than the insert


HC Hemalatha Chiranjeevulu Syncfusion Team November 24, 2020 10:31 PM UTC

Hi Pascal,

Thank you for your update.

In DocIO, default XHTMLValidationType is Transitional. So,
It internally rises problem and fails when appending the not well formatted HTML string. To insert the HTML content, we suggest you to reuse the XHTMLValidationType from the Word document instance instead of mentioning it in explicitly as per the code snippet which we shared in last update.

Please let us know if you have any other questions.

Regards,
Hemalatha C



PS Pascal Seifert November 25, 2020 07:48 AM UTC

thanks for the reply!

XHTMLValidationType set to Transitional does not let me validate ANY xhtml.
What is a valid example input for "Transitional" type that does return true?

I think the whole "ValidXHtml" does not work well together with the "InsertXHtml".
Together with the bug, that a valid xhtml from "None" type does break the PDF (initial issue) i cannot see a good solution.

i tried <br />
and a full html like: <html><body><h1>test</h1></body></html>
even a full xhtml  fails:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>test</title>
</head>
<body>
<p>test</p>
</body>
</html>


RM Ramaraj Marimuthu Syncfusion Team November 26, 2020 04:40 PM UTC

Hi Pascal,

Thank you for your update.

At present, we are investigating your query and will update our further details by November 27, 2020.

Regards,
Ramaraj Marimuthu 



HC Hemalatha Chiranjeevulu Syncfusion Team November 30, 2020 06:05 PM UTC

Hi Pascal, 
 
Thank you for your patience. 
 
We have generated the document in our local machine and we are facing some difficulties while deploying application in Azure environment. We will check in Azure environment and also update along with above validation details on 2 days.
 
 
We have checked the given HTML strings and the below cases satisfies the Transistional XHTMLValidation. Please refer the below code snippet which tried at our end:  
Text  
 
//Loads the template document 
WordDocument document = new WordDocument(); 
document.EnsureMinimal(); 
//Html string to be inserted 
string htmlstring = "<html><body><h1>test</h1></body></html>"; 
//Validates the Html string 
bool isValidHtml = document.LastSection.Body.IsValidXHTML(htmlstring, 
XHTMLValidationType.Transitional); 
//When the Html string passes validation, it is inserted to the document 
if (isValidHtml) 
{ 
//Appends Html string as first item of the second paragraph in the document 
document.Sections[0].Body.InsertXHTML(htmlstring, 0, 0); 
} 
//Saves and closes the document 
document.Save("Sample.docx"); 
document.Close();  
 
Also, the given full HTML working fine at our end with the below modified code snippets.   
Text  
 
//Html string to be inserted 
string htmlstring = File.ReadAllText(@"D:\FullHTML.txt"); 
//Validates the Html string 
bool isValidHtml = document.LastSection.Body.IsValidXHTML(htmlstring, 
XHTMLValidationType.Transitional);  

For your reference, we have attached the input HTML and generated Word document and it can be downloaded from the below link: 
https://www.syncfusion.com/downloads/support/forum/159692/ze/Documents-849570038  
 
Please let us know if you have any other questions. 
 
Regards, 
Hemalatha C 



PS Pascal Seifert December 1, 2020 08:13 AM UTC

Thanks for your example, i tried your exact code and it did not work with latest version 18.3.0.52 of Syncfusion.DocIORenderer.Net.Core in dotnet core 3.1.300.
See the attached full project, here is the code snippet from it:


            //Loads the template document 
            WordDocument document = new WordDocument();
            document.EnsureMinimal();
            //Html string to be inserted 
            string htmlstring = "<html><body><h1>test</h1></body></html>";
            //Validates the Html string 
            bool isValidHtml = document.LastSection.Body.IsValidXHTML(htmlstring, XHTMLValidationType.Transitional);

            Console.WriteLine("Is Valid: {0}", isValidHtml); //False
            Console.ReadLine();

Attachment: syncfusionwordfail_d49d17c.zip


HC Hemalatha Chiranjeevulu Syncfusion Team December 2, 2020 06:22 PM UTC

Hi Pascal,

Thank you for your update.

From the given details, we have found that you are using ASP.NET Core application at your end. In ASP.NET Core platform, the Transitional and Strict XHTMLValidation options are limitation. So, we suggest you to use XHTMLValidation.None at your end.

Please let us know if you have any further questions.

Regards,
Hemalatha C
 



PS Pascal Seifert December 3, 2020 08:26 AM UTC

Thanks for your reply.
If you read this thread carefully, you will see we are going in circles here.
As i already said, validation mode "None" will not validate the input HTML properly for the Insert-Method.

i have attached a sample for you, were validate method will return true with validation mode "None" and insert will fail.
This leads, again, into the initially bug.

Code:
            //Loads the template document 
            WordDocument document = new WordDocument();
            document.EnsureMinimal();

            //Html string to be inserted 
            string htmlstring = "<html><body>test<br>break</body></html>";

            //Validates the Html string 
            bool isValidHtml = document.LastSection.Body.IsValidXHTML(htmlstring,  XHTMLValidationType.None);

            //When the Html string passes validation, it is inserted to the document 
            if (isValidHtml)
            {
                //Appends Html string as first item of the second paragraph in the document 
                document.Sections[0].Body.InsertXHTML(htmlstring, 0, 0); //EXCEPTION: DocIO support only welformatted xhtml 
            }

            //Saves and closes the document 
            document.Save(File.OpenWrite("sample.docx"), FormatType.Automatic);
            document.Close();

Attachment: syncfusionwordfail2_19231db9.zip


HC Hemalatha Chiranjeevulu Syncfusion Team December 4, 2020 05:25 PM UTC

Hi Pascal,

Thank you for your update.

We can reproduce the reported issue in our end, and we suspect it to be a defect. We will validate this issue and update you with more details on 8th December 2020.

At now, kindly use the below modified html at your end to achieve your requirement


 
//Html string to be inserted 
string htmlstring = "<html><body>test<br/>break</body></html>"  
 

Please let us know if you have any other questions.

Regards,
Hemalatha C
 



HC Hemalatha Chiranjeevulu Syncfusion Team December 7, 2020 05:40 PM UTC

Hi Pascal,

Thank you for your patience.


We have confirmed that the reported issue “XHTMLValidation validates improperly when validating None in HTML conversion
is a defect. Since you are using our Weekly NuGet release (v18.3.0.52), we have planned to include this fix in our weekly NuGet release on 22nd December 2020.

To track the status of the reported issue, please use the following feedback link.
https://www.syncfusion.com/feedback/20401/xhtmlvalidation-validates-improperly-when-validating-none-in-html-conversion

Note: If you require a patch for this issue in any other version, please kindly let us know the currently installed version, so that we can provide a patch in that version based on our SLA policy.

Please let us know if you have any other questions.

Regards,
Hemalatha C



PS Pascal Seifert December 7, 2020 05:46 PM UTC

thanks so much, the nuget release is just fine!
also the attached link for the issue from you is leading into a 404 and not working.


MJ Mohanaselvam Jothi Syncfusion Team December 8, 2020 09:16 AM UTC

Hi Pascal,

Sorry for the inconvenience.

Now, you can able to see the feedback link to track the status of bug report.

We have planned main release at mid of December and we doesn’t have weekly NuGet on 22nd December, 2020. Also, our release works are at final stage so, it is not feasible to include your bug fix in our main release at last minute. We need some amount of time to fix and ensure automation testing at our side. So, we have planned to include your bug fix on our upcoming Weekly NuGet on 29th December, 2020.

To track the status of the reported issue, please use the following feedback link.
https://www.syncfusion.com/feedback/20401/xhtmlvalidation-validates-improperly-when-validating-none-in-html-conversion

Please let us know if you have any other questions.

Regards,
Mohanaselvam J 



HC Hemalatha Chiranjeevulu Syncfusion Team December 31, 2020 02:57 PM UTC

Hi Pascal,

Thank you for your patience.

We have included the fix for mentioned issue with "XHTMLValidation validates improperly when validating None in HTML conversion" in our latest Weekly NuGet (v18.4.0.32). We are glad to announce that our latest Weekly NuGet is now available. You can use that latest NuGet to get resolve this issue at your end.

The status of this task can be tracked through below link:
https://www.syncfusion.com/feedback/20401/xhtmlvalidation-validates-improperly-when-validating-none-in-html-conversion

Note: This fix will be include in our 2020 Volume 4 SP release, which is expected to be available at end of January, 2021 tentatively.

Please let us know if you have any other questions.

Regards,
Hemalatha C
 


Marked as answer
Loader.
Up arrow icon