Good day,
I have tried out the simple sample provided for OCR on winforms. The sample works fine. However, when I use any PDF other than the provided sample.pdf, it says that the text is null. It recognizes the proper number of pages, but has null text on all pages.
Is there any known reason for why this would only work for the sample doc?
Here is the code:
openFileDialog1.ShowDialog();
string filePath = openFileDialog1.FileName;
//Initialize the OCR processor by providing the path of tesseract binaries(SyncfusionTesseract.dll and liblept168.dll)
using (OCRProcessor processor = new OCRProcessor(@"../../Data/Tesseract binaries/"))
{
//Load a PDF document
PdfLoadedDocument lDoc = new PdfLoadedDocument(filePath);
//Set OCR language to process
processor.Settings.Language = Languages.English;
//Process OCR by providing the PDF document and Tesseract data
OCRLayoutResult result;
processor.PerformOCR(lDoc, @"../../Data/Tessdata/", out result);
MessageBox.Show(result.Pages[0].Lines[0].Text); //this errors because the text returns null on all pdfs other than the sample
//Save the OCR processed PDF document in the disk
lDoc.Save("Sample.pdf");
lDoc.Close(true);
System.Diagnostics.Process.Start("Sample.pdf");
}
Thank you!