We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy.
Unfortunately, activation email could not send to your email. Please try again.

OCR

Thread ID:

Created:

Updated:

Platform:

Replies:

121879 Dec 26,2015 11:31 AM Jul 8,2016 07:30 AM ASP.NET MVC 16
loading
Tags: PDF
Fred Vreenegoor
Asked On December 26, 2015 11:31 AM

I try the ocr example but it failed with Tesseract engine has not been initialized .

using (OCRProcessor processor = new OCRProcessor(AppDomain.CurrentDomain.BaseDirectory + "bin\\lib"))
{
         //Language to process the OCR
         processor.Settings.Language = Languages.English;
        //Process OCR by providing loaded PDF document, Data dictionary and language
        processor.PerformOCR(lDoc, "d:\\");
        //Save the OCR processed PDF document in the disk
	Response.Clear();
	//Save the pdf file
	lDoc.Save(@"d:\Sample.pdf");
	lDoc.Close(true);
}
IN the app/bin/lib:

someone have any suggestions?

Abirami Selvan [Syncfusion]
Replied On December 27, 2015 11:24 PM

Hi Fred ,
Thank you for contacting Syncfusion support.
We need to provide the correct path for tesseract engine when initialize and processing. We have attached the simple sample and tesseract assemblies for your reference.
Please refer to the following code snippet:

// Initialize the OCR processor

PdfLoadedDocument lDoc = new PdfLoadedDocument(Server.MapPath("/App_Data/Region.pdf"));

//Load the existing PDF document.

using (OCRProcessor processor = new OCRProcessor(Server.MapPath(@"\App_Data\Tesseract binaries\")))

{

//Language to process the OCR

processor.Settings.Language = Languages.English;

//Process OCR by providing loaded PDF document, Data dictionary and language

string resulttext = processor.PerformOCR(lDoc, Server.MapPath(@"\App_Data\Tessdata\"));

}

//Save the document

lDoc.Save(Server.MapPath("/Output/output.pdf"));

//close the document
lDoc.Close(true);
Sample link:
http://www.syncfusion.com/downloads/support/forum/121532/ze/MvcApplication11860559793
Please try this and let us know if you need any further assistance.
Regards,
Abirami.


Fred Vreenegoor
Replied On January 27, 2016 07:31 AM

Thanks for the reply it was in the
processor.PerformOCR(lDoc, "d:\\");
That I made the mistake

It works good now.
The example in the link gives me errors.

Fred Vreenegoor
Replied On January 27, 2016 08:09 AM

Is there a way to set a Dutch Language?
or is it automatically looking for the nld.traineddata file?

Abirami Selvan [Syncfusion]
Replied On January 27, 2016 11:19 PM

Hi Fred ,
We can apply the OCR to the Dutch language and we manually need to copy the tesseract language data according to the language.
You can refer the following link to get more details about OCR to the other languages:
https://www.syncfusion.com/kb/4051/how-to-support-german-and-other-languages-in-the-ocr-processor
Please try this and let us know if you have any further assistance.
Regards,
Abirami.

Fred Vreenegoor
Replied On January 28, 2016 03:16 AM

thanks a lot Abirami.

By the way, maybe is it a good idea when there is follow up to automatically send an email

Fred

Bhuvaneswari T [Syncfusion]
Replied On January 29, 2016 07:40 AM

Hi Fred,

Thanks for your suggestion.

We consider this feature in our upcoming release.

Regards,
Bhuvaneswari T

Megatron
Replied On February 2, 2016 06:14 PM

Hi how is the user sending to server/uploading the documents, is it a mime type, and if so what do I need in my  IIS server or web config file.

Megatron
Replied On February 2, 2016 06:16 PM

Hi also is there a dll called syncfu.tesseract?? after my SF MVC install there is no such dll, what do I need to install to get the tesseract dll.

thanks

Megatron
Replied On February 2, 2016 06:23 PM

Hi when i tried to follow help, and place the SyncfusionTesseract.dll and liblept168.dll Tesseract assemblies in local references i get an error. But I can see it in my install folder, can you tell me what and how to add them.





Chinnu Muniyappan [Syncfusion]
Replied On February 3, 2016 05:39 AM

Hi Fred,

We do not want to refer SyncfusionTesseract.dll and liblept168.dll assemblies directly to the project, instead you have to provide the local path of this assemblies to the OCRProcessor, please refer the below code snippet and link for more details.

OCRProcessor processor = new OCRProcessor(@"TesseractBinaries\");
http://help.syncfusion.com/file-formats/pdf/working-with-ocr#prerequisites-and-setting-up-the-tesseract-engine

Regards,
Chinnu


Megatron
Replied On February 3, 2016 11:35 AM

Thanks for the explanation, also the Tesseract dll's are over 18 months old, can you please include the latest from the github site and the latest lang dlls

Praveenkumar H [Syncfusion]
Replied On February 4, 2016 07:40 AM

Hi Megatron,

Thank you for your update,

We have the plan to update the tesseract binaries to stable version of 3.04 .

Please let us know if you need further assistance.

With Regards,
Praveen

Manuela Wakonig
Replied On July 6, 2016 04:25 AM

I am using OCR. It works fine on my local maschine and on our app server.
But it's not working on the second app server.
The application failes with "Tesseract engine has not been initialized"

In the <appSettings> Section in my web.config i have

   <add key="TesseractBinary" value="C:\Tools\OCRProcessor\" />
    <add key="TesseractData" value="C:\Tools\OCRProcessor\Tessdata\" />
    <add key="GhostScriptBinary" value="C:\Program Files\gs\gs9.19\bin\gsdll64.dll" />

and my code:

 private readonly string _tessdata = ConfigurationManager.AppSettings.Get("TesseractData").ToString();
        private readonly string _tessBinary = ConfigurationManager.AppSettings.Get("TesseractBinary").ToString();
        private readonly string _ghostscript= ConfigurationManager.AppSettings.Get("GhostScriptBinary").ToString();


 public List<ShippingInformation> ConvertToOcr(PdfLoadedDocument lDoc)
        {
            var result = new List<ShippingInformation>(); 
            using (var processor = new OCRProcessor(_tessBinary))
            {
                _logger.Info("processor check");
                Bitmap source = null;
                foreach (PdfLoadedPage page in lDoc.Pages)
                {
                    var splitDoc = new PdfDocument();
                    splitDoc.ImportPage(lDoc, page);
                    processor.Settings.Language = Languages.English;

                    var pdfStream = new MemoryStream();
                    splitDoc.Save(pdfStream);
                    splitDoc.Close(true);
                    var ldSplitDoc = new PdfLoadedDocument(pdfStream);
                    processor.PerformOCR(ldSplitDoc,_tessdata);  
                  [...]
               }
        }


any suggestions?

Chinnu Muniyappan [Syncfusion]
Replied On July 7, 2016 05:23 AM

Hi Manuela, 

Thank you for your update. 

Could you please make sure that the tesseract binaries are properly shipped to the tesseract folder. (Ex: C:\Tools\OCRProcessor\). 
If still you are facing the same issue could you please create a new direct trac incident regarding this, we will update further details through the incident. 

Regards, 
Chinnu 


Manuela Wakonig
Replied On July 7, 2016 06:17 AM

all tesseract binaries are in the folder.
Does the account that runs the application pool need special permissions to initialize tesseract?

Chinnu Muniyappan [Syncfusion]
Replied On July 8, 2016 07:30 AM

Hi Manuela, 
 
Thank you for your update. 
 
We have updated the details in your newly created incident, please refer the incident for further details. 
 
Regards, 
Chinnu 


CONFIRMATION

This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.

;