We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy.
Unfortunately, activation email could not send to your email. Please try again.

PdfLoadedPage.ExtractText() giving encoded text

Thread ID:

Created:

Updated:

Platform:

Replies:

128741 Feb 8,2017 01:18 PM Feb 15,2017 03:36 AM WPF 3
loading
Tags: PdfViewer
Jeff
Asked On February 8, 2017 01:18 PM

I have a pdf file generated by a third party system with font encoding TrueType (CID) Identity-H that I am trying to parse using the PdfLoadedPage.ExtractText() method. 

I'm able to parse PDFs from other sources using syncfusion successfully, but these PDFs are only returning strange characters.  I see other threads on the forum that Identity-H encoded PDFs should return results just fine, but can't see what configuration setting I might be missing in order to read this file...


Please could you let me know why would ExtractText() not work here and is there any workaround? FYI, I am using syncfusion 14.3451.0.49

Attached is a PDF I am trying to read using the below code:

            using (var loadedDocument = new PdfLoadedDocument(filepath))
            {
                var page = loadedDocument.Pages[0];
                var text = page.ExtractText();
                loadedDocument.Close();
            }

The variable text ends up with: 3DJH 
                       $QHVWKHVLD %LOOLQJ 6XPPDU
                              0$,1
3DWLHQW ,QIRUPDWLRQ
3DWLHQW
V 1DPH 
5(*,675$7,21 0$5<$11
661 
$GGUHVV 
 DQ VWUHHW *HUPDQWRZQ 0' 
3KRQH 1XPEHU     $JH 
 DUV
    '2% 

    6H[ ....

Attachment: Sample_PDF_25176659.zip

Balasubramanian Sundararajan [Syncfusion]
Replied On February 9, 2017 06:30 AM

Hi Jeff, 
 
Thank you for using Syncfusion product. 
 
The reported issue in extracting the text from the attached PDF document has been fixed in our Essential Studio 2016 Volume 4 SP1. We request you to upgrade to our latest version of from the following link to get the issue resolved.  
 
‎  
We have also extracted the text from attached PDF document with our latest Essential Studio version and the resultant output can be downloaded from the following link, 
 
  
Please let us know if you need further assistance. 
 
Thanks, 
Balasubramanian S    


Jeff
Replied On February 14, 2017 09:23 AM

Thank you very much! I've updated to last December's release and am now able to parse the document.  What a difference a version makes...

Issue has been resolved.

Balasubramanian Sundararajan [Syncfusion]
Replied On February 15, 2017 03:36 AM

Hi Jeff, 
 
We are glad that the issue has been resolved at your end.  
 
Please let us know if you need further assistance. 
 
Thanks, 
Balasubramanian S 


CONFIRMATION

This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.

;