We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
close icon

OCR

I try the ocr example but it failed with Tesseract engine has not been initialized .

using (OCRProcessor processor = new OCRProcessor(AppDomain.CurrentDomain.BaseDirectory + "bin\\lib"))
{
         //Language to process the OCR
         processor.Settings.Language = Languages.English;
        //Process OCR by providing loaded PDF document, Data dictionary and language
        processor.PerformOCR(lDoc, "d:\\");
        //Save the OCR processed PDF document in the disk
	Response.Clear();
	//Save the pdf file
	lDoc.Save(@"d:\Sample.pdf");
	lDoc.Close(true);
}
IN the app/bin/lib:

someone have any suggestions?

16 Replies

AS Abirami Selvan Syncfusion Team December 28, 2015 04:24 AM UTC

Hi Fred ,
Thank you for contacting Syncfusion support.
We need to provide the correct path for tesseract engine when initialize and processing. We have attached the simple sample and tesseract assemblies for your reference.
Please refer to the following code snippet:

// Initialize the OCR processor

PdfLoadedDocument lDoc = new PdfLoadedDocument(Server.MapPath("/App_Data/Region.pdf"));

//Load the existing PDF document.

using (OCRProcessor processor = new OCRProcessor(Server.MapPath(@"\App_Data\Tesseract binaries\")))

{

//Language to process the OCR

processor.Settings.Language = Languages.English;

//Process OCR by providing loaded PDF document, Data dictionary and language

string resulttext = processor.PerformOCR(lDoc, Server.MapPath(@"\App_Data\Tessdata\"));

}

//Save the document

lDoc.Save(Server.MapPath("/Output/output.pdf"));

//close the document
lDoc.Close(true);
Sample link:
http://www.syncfusion.com/downloads/support/forum/121532/ze/MvcApplication11860559793
Please try this and let us know if you need any further assistance.
Regards,
Abirami.



FV Fred Vreenegoor January 27, 2016 12:31 PM UTC

Thanks for the reply it was in the
processor.PerformOCR(lDoc, "d:\\");
That I made the mistake

It works good now.
The example in the link gives me errors.


FV Fred Vreenegoor January 27, 2016 01:09 PM UTC

Is there a way to set a Dutch Language?
or is it automatically looking for the nld.traineddata file?


AS Abirami Selvan Syncfusion Team January 28, 2016 04:19 AM UTC

Hi Fred ,
We can apply the OCR to the Dutch language and we manually need to copy the tesseract language data according to the language.
You can refer the following link to get more details about OCR to the other languages:
https://www.syncfusion.com/kb/4051/how-to-support-german-and-other-languages-in-the-ocr-processor
Please try this and let us know if you have any further assistance.
Regards,
Abirami.


FV Fred Vreenegoor January 28, 2016 08:16 AM UTC

thanks a lot Abirami.

By the way, maybe is it a good idea when there is follow up to automatically send an email

Fred


BT Bhuvaneswari T Syncfusion Team January 29, 2016 12:40 PM UTC

Hi Fred,

Thanks for your suggestion.

We consider this feature in our upcoming release.

Regards,
Bhuvaneswari T


ME Megatron February 2, 2016 11:14 PM UTC

Hi how is the user sending to server/uploading the documents, is it a mime type, and if so what do I need in my  IIS server or web config file.


ME Megatron February 2, 2016 11:16 PM UTC

Hi also is there a dll called syncfu.tesseract?? after my SF MVC install there is no such dll, what do I need to install to get the tesseract dll.

thanks


ME Megatron February 2, 2016 11:23 PM UTC

Hi when i tried to follow help, and place the SyncfusionTesseract.dll and liblept168.dll Tesseract assemblies in local references i get an error. But I can see it in my install folder, can you tell me what and how to add them.






CM Chinnu Muniyappan Syncfusion Team February 3, 2016 10:39 AM UTC

Hi Fred,

We do not want to refer SyncfusionTesseract.dll and liblept168.dll assemblies directly to the project, instead you have to provide the local path of this assemblies to the OCRProcessor, please refer the below code snippet and link for more details.

OCRProcessor processor = new OCRProcessor(@"TesseractBinaries\");
http://help.syncfusion.com/file-formats/pdf/working-with-ocr#prerequisites-and-setting-up-the-tesseract-engine

Regards,
Chinnu



ME Megatron February 3, 2016 04:35 PM UTC

Thanks for the explanation, also the Tesseract dll's are over 18 months old, can you please include the latest from the github site and the latest lang dlls


PH Praveenkumar H Syncfusion Team February 4, 2016 12:40 PM UTC

Hi Megatron,

Thank you for your update,

We have the plan to update the tesseract binaries to stable version of 3.04 .

Please let us know if you need further assistance.

With Regards,
Praveen


MW Manuela Wakonig July 6, 2016 08:25 AM UTC

I am using OCR. It works fine on my local maschine and on our app server.
But it's not working on the second app server.
The application failes with "Tesseract engine has not been initialized"

In the <appSettings> Section in my web.config i have

   <add key="TesseractBinary" value="C:\Tools\OCRProcessor\" />
    <add key="TesseractData" value="C:\Tools\OCRProcessor\Tessdata\" />
    <add key="GhostScriptBinary" value="C:\Program Files\gs\gs9.19\bin\gsdll64.dll" />

and my code:

 private readonly string _tessdata = ConfigurationManager.AppSettings.Get("TesseractData").ToString();
        private readonly string _tessBinary = ConfigurationManager.AppSettings.Get("TesseractBinary").ToString();
        private readonly string _ghostscript= ConfigurationManager.AppSettings.Get("GhostScriptBinary").ToString();


 public List<ShippingInformation> ConvertToOcr(PdfLoadedDocument lDoc)
        {
            var result = new List<ShippingInformation>(); 
            using (var processor = new OCRProcessor(_tessBinary))
            {
                _logger.Info("processor check");
                Bitmap source = null;
                foreach (PdfLoadedPage page in lDoc.Pages)
                {
                    var splitDoc = new PdfDocument();
                    splitDoc.ImportPage(lDoc, page);
                    processor.Settings.Language = Languages.English;

                    var pdfStream = new MemoryStream();
                    splitDoc.Save(pdfStream);
                    splitDoc.Close(true);
                    var ldSplitDoc = new PdfLoadedDocument(pdfStream);
                    processor.PerformOCR(ldSplitDoc,_tessdata);  
                  [...]
               }
        }


any suggestions?


CM Chinnu Muniyappan Syncfusion Team July 7, 2016 09:23 AM UTC

Hi Manuela, 

Thank you for your update. 

Could you please make sure that the tesseract binaries are properly shipped to the tesseract folder. (Ex: C:\Tools\OCRProcessor\). 
If still you are facing the same issue could you please create a new direct trac incident regarding this, we will update further details through the incident. 

Regards, 
Chinnu 



MW Manuela Wakonig July 7, 2016 10:17 AM UTC

all tesseract binaries are in the folder.
Does the account that runs the application pool need special permissions to initialize tesseract?


CM Chinnu Muniyappan Syncfusion Team July 8, 2016 11:30 AM UTC

Hi Manuela, 
 
Thank you for your update. 
 
We have updated the details in your newly created incident, please refer the incident for further details. 
 
Regards, 
Chinnu 


Loader.
Live Chat Icon For mobile
Up arrow icon