We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
close icon

PdfLoadedPage.ExtractText() giving encoded text

I have a pdf file generated by a third party system with font encoding TrueType (CID) Identity-H that I am trying to parse using the PdfLoadedPage.ExtractText() method. 

I'm able to parse PDFs from other sources using syncfusion successfully, but these PDFs are only returning strange characters.  I see other threads on the forum that Identity-H encoded PDFs should return results just fine, but can't see what configuration setting I might be missing in order to read this file...


Please could you let me know why would ExtractText() not work here and is there any workaround? FYI, I am using syncfusion 14.3451.0.49

Attached is a PDF I am trying to read using the below code:

            using (var loadedDocument = new PdfLoadedDocument(filepath))
            {
                var page = loadedDocument.Pages[0];
                var text = page.ExtractText();
                loadedDocument.Close();
            }

The variable text ends up with: 3DJH 
                       $QHVWKHVLD %LOOLQJ 6XPPDU
                              0$,1
3DWLHQW ,QIRUPDWLRQ
3DWLHQW
V 1DPH 
5(*,675$7,21 0$5<$11
661 
$GGUHVV 
 DQ VWUHHW *HUPDQWRZQ 0' 
3KRQH 1XPEHU     $JH 
 DUV
    '2% 

    6H[ ....

Attachment: Sample_PDF_25176659.zip

3 Replies

BS Balasubramanian Sundararajan Syncfusion Team February 9, 2017 11:30 AM UTC

Hi Jeff, 
 
Thank you for using Syncfusion product. 
 
The reported issue in extracting the text from the attached PDF document has been fixed in our Essential Studio 2016 Volume 4 SP1. We request you to upgrade to our latest version of from the following link to get the issue resolved.  
 
 
We have also extracted the text from attached PDF document with our latest Essential Studio version and the resultant output can be downloaded from the following link, 
 
  
Please let us know if you need further assistance. 
 
Thanks, 
Balasubramanian S    



JE Jeff February 14, 2017 02:23 PM UTC

Thank you very much! I've updated to last December's release and am now able to parse the document.  What a difference a version makes...

Issue has been resolved.


BS Balasubramanian Sundararajan Syncfusion Team February 15, 2017 08:36 AM UTC

Hi Jeff, 
 
We are glad that the issue has been resolved at your end.  
 
Please let us know if you need further assistance. 
 
Thanks, 
Balasubramanian S 


Loader.
Live Chat Icon For mobile
Up arrow icon