APIs to spot font and style differences between 2 PDFs

Hi Team,

Do you have any APIs to find below between 2 documents

  1. Font name and size
  2. Color
  3. Bold
  4. Italic
  5. Read underlying url of Hyperlink

I have uploaded 2 sample PDF files. Please guide us in style comparison

Attachment: PDFs_1302e231.zip

5 Replies

JT Jeyalakshmi Thangamarippandian Syncfusion Team July 24, 2024 02:10 PM UTC

Hi Mahesh,

We can extract the text to retrieve the color, font name, size, bounds and its properties containing text by using TextLine API. For further details, please refer to the UG documentation:

Working with Text Extraction | Syncfusion

Class TextLine - API Reference (syncfusion.com)

As of now, to get hyperlinks, you should use PdfLoadedAnnotation API. For further details, please refer to the UG documentation:

Working with Annotations | Syncfusion

PdfLoadedUriAnnotation Class - C# PDF Library API Reference | Syncfusion


Regards,

Jeyalakshmi T



MM Mahesh Machina replied to Jeyalakshmi Thangamarippandian July 24, 2024 05:33 PM UTC

Hi Jeyalakshmi,

Syncfusion APIs to extract text from PDF is working fine if PDF has only plain text.

Please find the sample attached where it has unordered list/bullet points it failed to read text as it is. Some characters are missing/question marl or junk character is appearing in that poistion.

Existing is read as Exising


Attachment: PDFReadingError_382c822c.zip


JT Jeyalakshmi Thangamarippandian Syncfusion Team July 25, 2024 12:07 PM UTC

Hi Mahesh,

We are trying to replicate the problem on our end using our test documents and we are not able to reproduce it. We suspect that the issue is document-specific. Therefore, we request you to share the input PDF document with us so that we can replicate the problem on our end. It will be more helpful for us to analyze further and provide you with a prompt solution.

Sample:

https://www.syncfusion.com/downloads/support/directtrac/general/ze/Console_Sample575139362


Regards,

Jeyalakshmi T



MM Mahesh Machina replied to Jeyalakshmi Thangamarippandian July 26, 2024 10:31 AM UTC

Apologies. forgot to attach actual PDF we are trying with.

Please find the same attached


Attachment: PDfVersion4Modified_74cea5cd.zip


BV Brundha Velusamy Syncfusion Team July 29, 2024 11:41 AM UTC

Hi Mahesh,

After a thorough review of the provided document, we discovered that the word appears as "Exisng" instead of "Existing." Consequently, our output is as expected. For your reference, we have attached a screenshot of the document. To replicate the issue on our end, we kindly request you to share the problematic input document. This will assist us in further analysis and allow us to provide a prompt solution.


Regards,

Jeyalakshmi T


Loader.
Up arrow icon