I have a pdf file generated by a third party system with font encoding TrueType (CID) Identity-H that I am trying to parse using the PdfLoadedPage.ExtractText() method.
I'm able to parse PDFs from other sources using syncfusion successfully, but these PDFs are only returning strange characters. I see other threads on the forum that Identity-H encoded PDFs should return results just fine, but can't see what configuration setting I might be missing in order to read this file...
Please could you let me know why would ExtractText() not work here and is there any workaround? FYI, I am using syncfusion 14.3451.0.49
Attached is a PDF I am trying to read using the below code:
using (var loadedDocument = new PdfLoadedDocument(filepath))
{
var page = loadedDocument.Pages[0];
var text = page.ExtractText();
loadedDocument.Close();
}
The variable text ends up with: 3DJH
$QHVWKHVLD %LOOLQJ 6XPPDU
0$,1
3DWLHQW ,QIRUPDWLRQ
3DWLHQW
V 1DPH
5(*,675$7,21 0$5<$11
661
$GGUHVV
DQ VWUHHW *HUPDQWRZQ 0'
3KRQH 1XPEHU $JH
DUV
'2%
6H[ ....
Attachment:
Sample_PDF_25176659.zip