PDF Conformance and error handling

Question

Hello,I support an application that uses SyncFusion libraries to merge PDFs that end users upload. I've been working on improving error handling for this app when there's corrupt or invalid PDFs by preventing them from being uploaded. We're using the .NET SyncFusion packages in our app. I was wondering if a PDF that returns conformance as "None" should be handled similar to how a corrupt PDF would be handled, or if there are limitations to what gets returned and when "None" is returned?I've been trying to add in some error-handling for situations where the syntax analyzer doesn't catch a bad PDF, i.e. this code:            //Load the PDF document            Stream docStream = new MemoryStream(pdfArray);            //Create a new instance for the PDF analyzer            PdfDocumentAnalyzer analyzer = new PdfDocumentAnalyzer(docStream);            //Get the syntax errors            SyntaxAnalyzerResult result = analyzer.AnalyzeSyntax(); //Check whether the document is corrupted or not            if (result.IsCorrupted)            {// Code to handle these PDFs}does not return corrupted, but there's some other syntax errors that make it so the SyncFusion libraries can't work with the PDF and it throws an error when trying to load it and use the Form value. I'm working on catching and returning a message to end users for both instances so that they can work with our technical staff to figure out a way to fix/upload valid PDFs then too. I've noticed that if I add this in an else loop after the above if(result.IsCorrupted):else            {                PdfLoadedDocument testDoc = new PdfLoadedDocument(pdfArray);                PdfConformanceLevel pdfConformance = testDoc.Conformance;                if(pdfConformance.ToString() == "None")                {                    analyzer.Close();// false return indicates issue with PDF                    return false;                }                analyzer.Close();                return true;            }It'll catch a bunch of false negatives along with true bad PDFs that has invalid syntax/don't meet conformance but are also not caught by the analyzer code. e.g. the attached files in the zip are couple of examples:- organized-labor-workforce-empty: an example that doesn't get caught by the document analyzer but can get caught when checking for conformance. This is one that errors out in other parts of our code that use SyncFusion libraries, and when run through a PDF validator does return as a not valid PDF. - app-info-lifecycle-mgmt_brochure_1947.pdfRandom example PDF that is not corrupt, meets PDF 1.3 standard according to this site: https://www.pdf-online.com/osa/validate.aspx but returns "None" for PDF Conformance when using the else code. This is an example false negative I'm trying to avoid. Will I not be able to use conformance as a way to prevent bad PDFs from being uploaded due to it returning none for valid PDFs? When does the conformance check return "none"? Is there anything I should be doing with PDFs before checking for the conformance value to ensure it is properly doing it?Thanks for your help. Attachment: samplepdfs_cadfbf8f.zip

Gowthamraj Kumar · Answer

Hi Megan, 
 
Thank you for contacting Syncfusion support.Syncfusion PDF Library provides APIs to find out whether a PDF file is corrupt or not by analyzing its structure and syntax. It also provides APIs to repair basic cross-reference offset-level corruption in PDF files. You can use these to avoid unexpected behavior while processing the PDF files in your .NET applications. 
 
Please refer the below link for more information,   
UG: https://help.syncfusion.com/file-formats/pdf/working-with-document#find-corrupted-pdf-document     
KB: https://www.syncfusion.com/kb/10760/how-to-find-corrupted-pdf-document-using-c-and-vb-net     
Blogs: https://www.syncfusion.com/blogs/post/how-to-find-corrupted-pdf-files-in-c-sharp.aspx     
   
You can find only the conformance level of the existing PDF document using the Conformance property. If the document does not have any conformance, it return as None. And we do not have support for capturing the error details from Conformance document with PDF standards.    
Regards, 
Gowthamraj K

Megan Anderson · Answer

Thanks Gowthamraj.So I think I am technically doing this partially right on the first half, but still not clear for the 2nd.Does the conformance level return "None" only if it doesn't meet one of these standards?:  Enum PdfConformanceLevel - FileFormats API Reference | SyncfusionSo when I am trying to use this as a 2nd check which involves checking a PDF conformance level, it might return "none" for PDF standards not officially supported by SyncFusion? Even though other SyncFusion code might still work OK with these other standards? e.g. if it is not a PDF/A or PDF/X type but one of these others:  8 Types of PDF Standards – Each Serves a Unique Purpose (marconet.com) like PDF, PDF/E, PDF/UA, it could return "none" but still be a valid PDF and still technically might work with the SyncFusion libraries?If it helps, the check I was trying to incorporate in addition to the analyzer code was because we had a PDF with a syntax error that was NOT caught by this sample code:  Working with Document | SyncfusionLater on it was triggering an exception when trying to access the Form value of the PdfLoadedDocument elsewhere in our application. I'm trying to catch files like that, though so far we've only had one in the past year. It was through SyncFusion support that we discovered a PDF validator was returning syntax errors not caught by the Analyzer, ones that would not be fixed with the OpenAndRepair option, and that was when I was trying to figure out if there's anything else we can add to prevent that type of PDF upload.So the root of what I am trying to do is see if there is a way for me to more gracefully prevent similar PDFs (i.e. ones not caught by the Analyzer but that are still bad PDFs) from being used in the application, and route users to our IT support so we can figure out what is wrong. But the issue I am finding is the "None" returns with PDFs that otherwise have no issue being used with the SyncFusion libraries - it is looking like with test PDFs I am getting a lot of "false negatives." PDFs that can load without issue and work with the Merge are returning "None," so it is looking like I either: 1.) can't rely on using that as a fallback for preventing bad PDFs from being used, and 2.) if I wanted to use conformance as a 2nd check, I might also have to try to convert the files to PDF/A first if they are regular PDFs?: Working with PDF conformance | Syncfusionhttps://help.syncfusion.com/file-formats/pdf/working-with-pdf-conformance#pdf-to-pdfa-conversion And then see if the 2nd check works with my bad file (e.g. it won't be able to convert the bad one to PDF/A but the other "false negative" regular PDFs might be able to)?

Gowthamraj Kumar · Answer

Hi Megan, 
 
Thank you for your update. 
 
Currently, we are analyzing on your requirement on our end and we will update the further details by August 6th 2021.Regards, 
Gowthamraj K

Gowthamraj Kumar · Answer

Hi Megan, 
 
Thank you for your patience. 




Does the conformance level return "None" only if it doesn't meet one of these standards?: Enum PdfConformanceLevel - FileFormats API Reference | Syncfusion 
 
 
 
So when I am trying to use this as a 2nd check which involves checking a PDF conformance level, it might return "none" for PDF standards not officially supported by SyncFusion? Even though other SyncFusion code might still work OK with these other standards? e.g. if it is not a PDF/A or PDF/X type but one of these others: 8 Types of PDF Standards – Each Serves a Unique Purpose (marconet.com) like PDF, PDF/E, PDF/UA, it could return "none" but still be a valid PDF and still technically might work with the SyncFusion libraries? 

At present, we do have support for some of the conformance levels in our PDF library. Those supported conformance level are mentioned in our documentation, please refer the below link,  
https://help.syncfusion.com/file-formats/pdf/working-with-pdf-conformance 
  
 
As we said earlier, we can get the conformance level of the existing PDF document using the Conformance property. We return the conformance level as “None”, if the document does not contain conformance (Normal PDF) or unsupported (other than above the list) conformance standards.  
 


If it helps, the check I was trying to incorporate in addition to the analyzer code was because we had a PDF with a syntax error that was NOT caught by this sample code: Working with Document | Syncfusion 
 
Later on it was triggering an exception when trying to access the Form value of the PdfLoadedDocument elsewhere in our application. I'm trying to catch files like that, though so far we've only had one in the past year. It was through SyncFusion support that we discovered a PDF validator was returning syntax errors not caught by the Analyzer, ones that would not be fixed with the OpenAndRepair option, and that was when I was trying to figure out if there's anything else we can add to prevent that type of PDF upload. 
 

As we said earlier, we can find the basis level of syntax errors from the PDF document by using PdfDocumentAnalyzer. We could not able to find the complex level syntax errors using this approach. 
 
If you are facing an exception while trying to access the form values from Pdf loaded document, we request you to share the document, exception details, complete code snippet to reproduce the exception on our end. So, that it will be helpful for us to analyze and assist you further on this. 


So the root of what I am trying to do is see if there is a way for me to more gracefully prevent similar PDFs (i.e. ones not caught by the Analyzer but that are still bad PDFs) from being used in the application, and route users to our IT support so we can figure out what is wrong. But the issue I am finding is the "None" returns with PDFs that otherwise have no issue being used with the SyncFusion libraries - it is looking like with test PDFs I am getting a lot of "false negatives." PDFs that can load without issue and work with the Merge are returning "None," so it is looking like I either: 1.) can't rely on using that as a fallback for preventing bad PDFs from being used, and 2.) if I wanted to use conformance as a 2nd check, I might also have to try to convert the files to PDF/A first if they are regular PDFs?: Working with PDF conformance | Syncfusion 
 
https://help.syncfusion.com/file-formats/pdf/working-with-pdf-conformance#pdf-to-pdfa-conversion 
 
And then see if the 2nd check works with my bad file (e.g. it won't be able to convert the bad one to PDF/A but the other "false negative" regular PDFs might be able to)? 

We can convert the normal PDF to PDF/A document by using our library, but we could not find the bad PDF ( which does not caught from PdfDocumentAnalyzer) on our end. We do not have support for validating the PDF document with standards from our end. So we could not proceed further on this. 
 
 
 
Regards, 
Gowthamraj K

Peter Groft · Answer

PDF/A is a version of PDF designed for archiving. It follows ISO standard 19005 and its purpose is to ensure that documents appear the same on any device accessing them in the long term.

There are a number of PDF/A sub-standards, and each has specific requirements to meet them:

PDF/A-1 (PDF/A-1a, PDF/A-1b)

PDF/A-2 (PDF/A-2a, PDF/A-2b, PDF/A-2u)

PDF/A-3 (PDF/A-3a, PDF/A-3b, PDF/A-3u)

PDF/A-4 (still in review)

PDF/A-1 specifies basic (B) and accessible (A) levels of conformance and has standards on colors, annotations, fonts and more.

PDF/A-2 includes specifications for JPEG 2000, layers, PDF packages, and attachments. It also includes a “u” variant which relates to Unicode and ensures that a user can perform a text search/extraction.

PDF/A-3 is largely similar to A-2 but allows non-PDF file types to be embedded or attached to conforming documents.

PDF/A-4 is still in the works and is expected to be released later in 2019.

Hope You Find This Useful,
Peter

Irfana Jaffer Sadhik · Answer

Thank you for sharing the details

Alex Wood · Answer

Great topic! Adding chat functionality can really improve user interaction, especially for businesses that rely on quick responses. I’ve used SignalR with Syncfusion controls before, and it worked really well for creating a real-time chat interface.

Kirthika Vijayagiri · Answer

Thank you for the suggestions.

Does the conformance level return "None" only if it doesn't meet one of these standards?: Enum PdfConformanceLevel - FileFormats API Reference \| Syncfusion So when I am trying to use this as a 2nd check which involves checking a PDF conformance level, it might return "none" for PDF standards not officially supported by SyncFusion? Even though other SyncFusion code might still work OK with these other standards? e.g. if it is not a PDF/A or PDF/X type but one of these others: 8 Types of PDF Standards – Each Serves a Unique Purpose (marconet.com) like PDF, PDF/E, PDF/UA, it could return "none" but still be a valid PDF and still technically might work with the SyncFusion libraries?	At present, we do have support for some of the conformance levels in our PDF library. Those supported conformance level are mentioned in our documentation, please refer the below link, https://help.syncfusion.com/file-formats/pdf/working-with-pdf-conformance As we said earlier, we can get the conformance level of the existing PDF document using the Conformance property. We return the conformance level as “None”, if the document does not contain conformance (Normal PDF) or unsupported (other than above the list) conformance standards.
If it helps, the check I was trying to incorporate in addition to the analyzer code was because we had a PDF with a syntax error that was NOT caught by this sample code: Working with Document \| Syncfusion Later on it was triggering an exception when trying to access the Form value of the PdfLoadedDocument elsewhere in our application. I'm trying to catch files like that, though so far we've only had one in the past year. It was through SyncFusion support that we discovered a PDF validator was returning syntax errors not caught by the Analyzer, ones that would not be fixed with the OpenAndRepair option, and that was when I was trying to figure out if there's anything else we can add to prevent that type of PDF upload.	As we said earlier, we can find the basis level of syntax errors from the PDF document by using PdfDocumentAnalyzer. We could not able to find the complex level syntax errors using this approach. If you are facing an exception while trying to access the form values from Pdf loaded document, we request you to share the document, exception details, complete code snippet to reproduce the exception on our end. So, that it will be helpful for us to analyze and assist you further on this.
So the root of what I am trying to do is see if there is a way for me to more gracefully prevent similar PDFs (i.e. ones not caught by the Analyzer but that are still bad PDFs) from being used in the application, and route users to our IT support so we can figure out what is wrong. But the issue I am finding is the "None" returns with PDFs that otherwise have no issue being used with the SyncFusion libraries - it is looking like with test PDFs I am getting a lot of "false negatives." PDFs that can load without issue and work with the Merge are returning "None," so it is looking like I either: 1.) can't rely on using that as a fallback for preventing bad PDFs from being used, and 2.) if I wanted to use conformance as a 2nd check, I might also have to try to convert the files to PDF/A first if they are regular PDFs?: Working with PDF conformance \| Syncfusion https://help.syncfusion.com/file-formats/pdf/working-with-pdf-conformance#pdf-to-pdfa-conversion And then see if the 2nd check works with my bad file (e.g. it won't be able to convert the bad one to PDF/A but the other "false negative" regular PDFs might be able to)?	We can convert the normal PDF to PDF/A document by using our library, but we could not find the bad PDF ( which does not caught from PdfDocumentAnalyzer) on our end. We do not have support for validating the PDF document with standards from our end. So we could not proceed further on this.