We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy.
Unfortunately, activation email could not send to your email. Please try again.

OCR, Fileformats: ocr crashes after setting a different language and the new "deu.traineddata"

Thread ID:





132839 Sep 24,2017 06:32 AM Sep 28,2017 09:39 AM ASP.NET MVC 5
Tags: PDF
Asked On September 24, 2017 06:32 AM

Using this works:
OCRProcessor processor = new OCRProcessor(BasePfad + @"\DLLs\OCRProcessor\");
PdfLoadedDocument lDoc = new PdfLoadedDocument(buch);

processor.Settings.Language = "eng";
processor.Settings.Performance = Performance.Slow;

processor.PerformOCR(lDoc, BasePfad + @"\OCRProcessor\Tessdata\");

Changing to this does not work anymore, the program crashes:
processor.Settings.Language = "deu"; AND putting the german file from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files into "Tessdata"

THe Error:
  Problemereignisname:    CLR20r3
  Problemsignatur 01:    tmp5CDF.tmp
  Problemsignatur 02:
  Problemsignatur 03:    59c7880f
  Problemsignatur 04:    Syncfusion.OCRProcessor.Base
  Problemsignatur 05:    15.3460.0.26
  Problemsignatur 06:    5981054b
  Problemsignatur 07:    2d
  Problemsignatur 08:    81
  Problemsignatur 09:    System.AccessViolationException
  Betriebsystemversion:    6.3.9600.
  Gebietsschema-ID:    1031
  Zusatzinformation 1:    5861
  Zusatzinformation 2:    5861822e1919d7c014bbb064c64908b2
  Zusatzinformation 3:    5f25
  Zusatzinformation 4:    5f2531ae070278f893fa99352dadd49e

Lesen Sie unsere Datenschutzbestimmungen online:

Wenn die Onlinedatenschutzbestimmungen nicht verfügbar sind, lesen Sie unsere Datenschutzbestimmungen offline:

Surya Kumar [Syncfusion]
Replied On September 25, 2017 08:51 AM

Hi JJads, 
Thank you for using Syncfusion products. 
We have tried to reproduce the issue which you have mentioned using the code snippet which you have given along with the tesseract data file for German from the link which you have given. But we are unable to reproduce the same. Please find the Sample in which we tried to reproduce the issue from the below link: 
can you please provide us the below mentioned details to help you better. 
  1. Essential Studio version   
  2. Operating System   
  3. Culture settings   
  4. System bit type (32-bit/64-bit)   
  5. Application platform and type of deployment.
Please let us know if you need any further information. 
Surya Kumar 

Replied On September 26, 2017 02:43 PM

I get the same error, take a look at the attachment(tell me if you need special informations).

string tessBin = new System.IO.DirectoryInfo(Path.Combine(BasePfad,@"\OCRProcessor\")).FullName;
                    string tessdata = new System.IO.DirectoryInfo(Path.Combine(BasePfad , @"\Tessdata\")).FullName;
                    using (OCRProcessor p = new OCRProcessor(tessBin))

                        processor.Settings.Language = "deu";
                        processor.Settings.Performance = Performance.Slow;

                        // Bitmap bitmap = new Bitmap(DataPathBase+"image.TIF");
                        processor.PerformOCR(loadedDocument, tessdata);

I used somehow your mentioned code:

Attachment: result_c9086364.7z

Surya Kumar [Syncfusion]
Replied On September 27, 2017 09:36 AM

Hi Jjads, 
We have analyzed the error log which you have given in the last update, we suspect that the error may be due to tesseract data which is used for OCR process or due to administrator permission for application.  
Please follow below mentioned steps in order to fix the issue. 
1.Try using the tesseract data that can be downloaded from below link with the application: 
2. Try running the Visual studio application in administrator mode (“Run as administrator”). 
Please let us know if the following steps fixed the issue. 
Surya Kumar 

Replied On September 27, 2017 11:51 AM

The file you gave me worked, the other file i download did not work ... https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

Surya Kumar [Syncfusion]
Replied On September 28, 2017 09:39 AM

Hi Jjads, 
Since our OCRProcessor uses Tesseract OCR version 3.0.2, we recommend using the tesseract data files for version 3.0.2, all the different language tesseract data under this version can be found in below link: 
Please let us know if you need any further information in this. 
Surya Kumar 


This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.