I have a pdf file generated by a third party system with font encoding TrueType (CID) Identity-H that I am trying to parse using the PdfLoadedPage.ExtractText() method.
I'm able to parse PDFs from other sources using syncfusion successfully, but these PDFs are only returning strange characters. I see other threads on the forum that Identity-H encoded PDFs should return results just fine, but can't see what configuration setting I might be missing in order to read this file...
Please could you let me know why would ExtractText() not work here and is there any workaround? FYI, I am using syncfusion 14.3451.0.49
Attached is a PDF I am trying to read using the below code:
using (var loadedDocument = new PdfLoadedDocument(filepath))
var page = loadedDocument.Pages;
var text = page.ExtractText();
The variable text ends up with: 3DJH
$QHVWKHVLD %LOOLQJ 6XPPDU
DQ VWUHHW *HUPDQWRZQ 0'
3KRQH 1XPEHU $JH