OCR - Tesseract Engine has not been initialised

Question

I've built a quick forms app to test parsing text from an image PDF. I've downloaded the OCR processor, added references to OCRprocessor.base, compression.base and pdf.base and included the correct paths to the binaries and Tessdata. However, I am still getting a "Tesseract Engine has not been initialised" error.

My code is:

public Form1()

{

InitializeComponent();

OCRProcessor processor = new OCRProcessor(@"C:\Program Files(x86)\Syncfusion\Tesseract Binaries\3.02");

PdfLoadedDocument loadedDocument = new PdfLoadedDocument(@"C:\.....\TestPDF.pdf");

processor.Settings.Language = Languages.English;

processor.PerformOCR(loadedDocument, @"C:\Program Files (x86)\Syncfusion\Tessdata\3.02");/*Fails here*/

loadedDocument.Save(@"C:\......\Read.pdf");

}

Any ideas on why I keep getting this error?

Sowmiya Loganathan · Answer

Hi Luke,

Thank you for contacting Syncfusion support.

Please follow the trouble shooting of OCR in the below UG documentation link to overcome the issue “Tesseract Engine has not been initialized”.

https://help.syncfusion.com/file-formats/pdf/working-with-ocr#troubleshooting

Also make sure the Syncfusion.OCRProcessor.Base.dll is Unblocked. Please refer the below screenshot for your reference.

Unblock the assembly and rebuild the project to overcome the issue with “Tesseract engine has not been initialized”.

Note: Make sure the bin folder does not contain the blocked assemblies.

However we have created the sample to perform OCR on PDF document. In which we have placed all the files(input document, Tesseract binaries and Tessdata) in Data folder.

Please find the sample for the same from below location:

https://www.syncfusion.com/downloads/support/forum/138112/ze/OCRSample704897765

Kindly try the above sample in your end and let us know if it solves the issue.

Regards,

Sowmiya L

Seema Kahane · Answer

Hellogetting below error after unblocking 'Syncfusion.Compression.Base.dll' DLL.Could not process Page '0' of file 'Input' from Property Documents Library. - Unhandled Exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. at Syncfusion.OCRProcessor.Native.OCRApi.InitializeDataPath(IntPtr pt, String path, String lang) at Syncfusion.OCRProcessor.OCRProcessor.DoOCR(String[] args) --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor) at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Program.Main(String[] args)Please help

Karmegam Seerangan · Answer

Hi Seema,Thank you for
reaching out to Syncfusion support.We use Google's
Tesseract engine internally to recognize text from scanned PDF documents and
images. This engine relies on the Tesseract and Leptonica binaries to process
images and extract text, using trained data files (.traineddata) for accurate
recognition. These binaries are included in
the runtimes/win-x64/native directory, and the trained data files are
located in runtimes/tessdata/. The issue you
reported typically occurs when the Tesseract binaries are either missing or the
path to them is incorrect. To resolve this, please ensure that the binaries are
present in the correct location and that the path is properly configured. Additionally, if the
binaries do not have sufficient read, write, and execute permissions, this
issue may also occur. We kindly request you to ensure that the required
permissions are granted for these binaries. We have included the
necessary binaries in the NuGet package itself. When you install the package
and build your application, the binaries are automatically copied to the
project directory, and the paths are configured accordingly. For your
reference, we have attached a sample project. Sample:  https://www.syncfusion.com/downloads/support/directtrac/general/ze/OCR_Framework_Application If you are still
experiencing issues, we recommend manually specifying the paths for both the
Tesseract binaries and the tessdata folder. If the problem persists,
please share the following details with us so we can replicate the issue on our
end:Complete code
snippetInput documentsEnvironment details
(OS platform, bit version, and RAM size)We’ll be happy to
assist you further.Regards,Karmegam