We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

OCR, Fileformats: ocr crashes after setting a different language and the new "deu.traineddata"


Using this works:
OCRProcessor processor = new OCRProcessor(BasePfad + @"\DLLs\OCRProcessor\");
                    
PdfLoadedDocument lDoc = new PdfLoadedDocument(buch);

processor.Settings.Language = "eng";
processor.Settings.Performance = Performance.Slow;

processor.PerformOCR(lDoc, BasePfad + @"\OCRProcessor\Tessdata\");


Changing to this does not work anymore, the program crashes:
processor.Settings.Language = "deu"; AND putting the german file from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files into "Tessdata"

THe Error:
"Problemsignatur:
  Problemereignisname:    CLR20r3
  Problemsignatur 01:    tmp5CDF.tmp
  Problemsignatur 02:    0.0.0.0
  Problemsignatur 03:    59c7880f
  Problemsignatur 04:    Syncfusion.OCRProcessor.Base
  Problemsignatur 05:    15.3460.0.26
  Problemsignatur 06:    5981054b
  Problemsignatur 07:    2d
  Problemsignatur 08:    81
  Problemsignatur 09:    System.AccessViolationException
  Betriebsystemversion:    6.3.9600.2.0.0.256.48
  Gebietsschema-ID:    1031
  Zusatzinformation 1:    5861
  Zusatzinformation 2:    5861822e1919d7c014bbb064c64908b2
  Zusatzinformation 3:    5f25
  Zusatzinformation 4:    5f2531ae070278f893fa99352dadd49e

Lesen Sie unsere Datenschutzbestimmungen online:
  http://go.microsoft.com/fwlink/?linkid=280262

Wenn die Onlinedatenschutzbestimmungen nicht verfügbar sind, lesen Sie unsere Datenschutzbestimmungen offline:
  C:\Windows\system32\de-DE\erofflps.txt

5 Replies

SK Surya Kumar Syncfusion Team September 25, 2017 12:51 PM UTC

Hi JJads, 
 
Thank you for using Syncfusion products. 
 
We have tried to reproduce the issue which you have mentioned using the code snippet which you have given along with the tesseract data file for German from the link which you have given. But we are unable to reproduce the same. Please find the Sample in which we tried to reproduce the issue from the below link: 
 
can you please provide us the below mentioned details to help you better. 
 
  1. Essential Studio version   
  2. Operating System   
  3. Culture settings   
  4. System bit type (32-bit/64-bit)   
  5. Application platform and type of deployment.
 
 
Please let us know if you need any further information. 
 
 
Regards, 
Surya Kumar 




JJ jjads September 26, 2017 06:43 PM UTC

I get the same error, take a look at the attachment(tell me if you need special informations).

string tessBin = new System.IO.DirectoryInfo(Path.Combine(BasePfad,@"\OCRProcessor\")).FullName;
                    string tessdata = new System.IO.DirectoryInfo(Path.Combine(BasePfad , @"\Tessdata\")).FullName;
                    using (OCRProcessor p = new OCRProcessor(tessBin))

                    {
                        processor.Settings.Language = "deu";
                        processor.Settings.Performance = Performance.Slow;

                        // Bitmap bitmap = new Bitmap(DataPathBase+"image.TIF");
                        processor.PerformOCR(loadedDocument, tessdata);
                    }

I used somehow your mentioned code:




Attachment: result_c9086364.7z


SK Surya Kumar Syncfusion Team September 27, 2017 01:36 PM UTC

Hi Jjads, 
 
We have analyzed the error log which you have given in the last update, we suspect that the error may be due to tesseract data which is used for OCR process or due to administrator permission for application.  
 
Please follow below mentioned steps in order to fix the issue. 
 
1.Try using the tesseract data that can be downloaded from below link with the application: 
2. Try running the Visual studio application in administrator mode (“Run as administrator”). 
 
Please let us know if the following steps fixed the issue. 
 
Regards, 
Surya Kumar 



JJ jjads September 27, 2017 03:51 PM UTC

The file you gave me worked, the other file i download did not work ... https://github.com/tesseract-ocr/tesseract/wiki/Data-Files



SK Surya Kumar Syncfusion Team September 28, 2017 01:39 PM UTC

Hi Jjads, 
Since our OCRProcessor uses Tesseract OCR version 3.0.2, we recommend using the tesseract data files for version 3.0.2, all the different language tesseract data under this version can be found in below link: 
Please let us know if you need any further information in this. 
 
Regards, 
Surya Kumar 


Loader.
Live Chat Icon For mobile
Up arrow icon