We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

Word to PDF to PDF/A (ASP.NET Core)

Hi,

I've been looking at this thread: https://www.syncfusion.com/forums/172278/convert-existing-pdf-to-pdf-a-document
Which I seem to reproduce on my end with a sample project (you can find it attached).

We would like to be able to convert a Word file (docx) to a PDF/A file. To my knowledge using Syncfusion there is no convertToPdfa method directly accessible on the WordDocument so I'm doing this instead :

1. Convert DOCX to PDF:

static void WordToPdf()
        {
            using (FileStream fileStream = new FileStream(Path.GetFullPath(DocxPath), FileMode.Open))
            {
                //Loads an existing Word document.
                using (WordDocument wordDocument = new WordDocument(fileStream, Syncfusion.DocIO.FormatType.Automatic))
                {
                    //Creates an instance of DocIORenderer.
                    using (DocIORenderer renderer = new DocIORenderer())
                    {
                        //Sets Chart rendering Options.
                        renderer.Settings.ChartRenderingOptions.ImageFormat = ExportImageFormat.Jpeg;
                        //Converts Word document into PDF document.
                        using (PdfDocument pdfDocument = renderer.ConvertToPDF(wordDocument))
                        {
                            //Saves the PDF file to file system.
                            using (FileStream outputStream = new FileStream(Path.GetFullPath(PdfPath), FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite))
                            {
                                pdfDocument.Save(outputStream);
                            }
                        }
                    }
                }
            }
        }

2. Then convert PDF to PDF/A:

static void PdfToPdfa()
        {
            //Load an existing PDF document.
            using (FileStream docStream = new FileStream(PdfPath, FileMode.Open, FileAccess.Read))
            {
                using (PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream))
                {
                    //Sample level font event handling.
                    loadedDocument.SubstituteFont += LoadedDocument_SubstituteFont;

                    //Convert the loaded document into PDF/A document.
                    loadedDocument.ConvertToPDFA(PdfConformanceLevel.Pdf_A1B);

                    using (MemoryStream memoryStream = new MemoryStream())
                    {
                        //Save the document.
                        loadedDocument.Save(memoryStream);

                        //Close the document.
                        loadedDocument.Close(true);

                        memoryStream.Position = 0;

                        using (FileStream fileStream = new FileStream(PdfaPath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
                        {
                            memoryStream.WriteTo(fileStream);
                        }
                    }
                }
            }
        }


Result is:

- PDF/A generated file "claims to be" PDF/A but is not compliant (verified with Acrobat Pro and verapdf)

- PDF file is OK, but inside the PDF/A file, some text is missing

Maybe there is something wrong with the font?

Could you please tell me if I'm doing something wrong or if there is a bug?


Thanks in advance


Attachment: PdfA_e26896df.zip

4 Replies 1 reply marked as answer

IJ Irfana Jaffer Sadhik Syncfusion Team December 7, 2022 04:52 AM UTC

On our further analysis, while converting pdf to Pdf A conformance, we have embedded all the used fonts in the existing pdf document. In that, we get the font from the cache collection if it is the same font. It causes the preservation issue. We can overcome this issue in sample level to clear the font cache before converting pdf to pdfA conformance. Please use below code snippet to clear the font cache.

 

 

        static void PdfToPdfa()

        {

 

            PdfDocument.ClearFontCache();

            //Load an existing PDF document.

            using (FileStream docStream = new FileStream(PdfPath, FileMode.Open, FileAccess.Read))

            {

                using (PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream))

                {

 

                    //Sample level font event handling.

                    loadedDocument.SubstituteFont += LoadedDocument_SubstituteFont;

 

                    //Convert the loaded document into PDF/A document.

                    loadedDocument.ConvertToPDFA(PdfConformanceLevel.Pdf_A1B);

 

                    using (MemoryStream memoryStream = new MemoryStream())

                    {

                        //Save the document.

                        loadedDocument.Save(memoryStream);

 

                        //Close the document.

                        loadedDocument.Close(true);

 

                        memoryStream.Position = 0;

 

                        using (FileStream fileStream = new FileStream(PdfaPath, FileMode.OpenOrCreate, FileAccess.ReadWrite))

                        {

                            memoryStream.WriteTo(fileStream);

                        }

                    }

                }

            }

        }

 

Kindly try the above solution on your end and let us know if you need any further assistance on this.


Marked as answer

BB Benjamin Boutrois December 7, 2022 03:21 PM UTC

Hi,


Thank you for your answer.


I tried adding the PdfDocument.ClearFontCache() instruction and it did work for the missing text.

However the file is still not compliant to PDF/A-1B (see verapdf report attached).


Attachment: verapdfReport_eab7aaa0.zip



IJ Irfana Jaffer Sadhik Syncfusion Team December 8, 2022 07:51 AM UTC

We suspect that the document contains the trail watermark in it. Due to this, the conformance is Invalid. This is not an issue. To overcome this, we must apply the registration license key to avoid a trial watermark and it will be resolved.

Please use the below code snippet to apply license :

Syncfusion.Licensing.SyncfusionLicenseProvider.RegisterLicense("Your License Key");