OCR, Fileformats: ocr crashes after setting a different language and the new "deu.traineddata"

5 Replies
2 Participants

Created by
JJ jjads

Platform
ASP.NET MVC

Platform
ASP.NET MVC

Control
PDF

Created On
Sep 24, 2017 10:32 AM UTC

Last Activity On
Sep 28, 2017 01:39 PM UTC

Want to subscribe?
SIGN IN

Using this works:
OCRProcessor processor = new OCRProcessor(BasePfad + @"\DLLs\OCRProcessor\");

PdfLoadedDocument lDoc = new PdfLoadedDocument(buch);

processor.Settings.Language = "eng";
processor.Settings.Performance = Performance.Slow;

processor.PerformOCR(lDoc, BasePfad + @"\OCRProcessor\Tessdata\");

Changing to this does not work anymore, the program crashes:
processor.Settings.Language = "deu"; AND putting the german file from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files into "Tessdata"

THe Error:
"Problemsignatur:
Problemereignisname:   CLR20r3
Problemsignatur 01:   tmp5CDF.tmp
Problemsignatur 02:   0.0.0.0
Problemsignatur 03:   59c7880f
Problemsignatur 04:   Syncfusion.OCRProcessor.Base
Problemsignatur 05:   15.3460.0.26
Problemsignatur 06:   5981054b
Problemsignatur 07:   2d
Problemsignatur 08:   81
Problemsignatur 09:   System.AccessViolationException
Betriebsystemversion:   6.3.9600.2.0.0.256.48
Gebietsschema-ID:   1031
Zusatzinformation 1:   5861
Zusatzinformation 2:   5861822e1919d7c014bbb064c64908b2
Zusatzinformation 3:   5f25
Zusatzinformation 4:   5f2531ae070278f893fa99352dadd49e

Lesen Sie unsere Datenschutzbestimmungen online:
http://go.microsoft.com/fwlink/?linkid=280262

Wenn die Onlinedatenschutzbestimmungen nicht verfügbar sind, lesen Sie unsere Datenschutzbestimmungen offline:
C:\Windows\system32\de-DE\erofflps.txt

5 Replies

SK Surya Kumar Syncfusion Team September 25, 2017 12:51 PM UTC

Hi JJads,

Thank you for using Syncfusion products.

We have tried to reproduce the issue which you have mentioned using the code snippet which you have given along with the tesseract data file for German from the link which you have given. But we are unable to reproduce the same. Please find the Sample in which we tried to reproduce the issue from the below link:

http://www.syncfusion.com/downloads/support/forum/132839/ze/MVCOcr836454879

can you please provide us the below mentioned details to help you better.

Essential Studio version
Operating System
Culture settings
System bit type (32-bit/64-bit)
Application platform and type of deployment.

Please let us know if you need any further information.

Regards,

Surya Kumar

JJ jjads September 26, 2017 06:43 PM UTC

I get the same error, take a look at the attachment(tell me if you need special informations).

string tessBin = new System.IO.DirectoryInfo(Path.Combine(BasePfad,@"\OCRProcessor\")).FullName;
                    string tessdata = new System.IO.DirectoryInfo(Path.Combine(BasePfad , @"\Tessdata\")).FullName;
                    using (OCRProcessor p = new OCRProcessor(tessBin))

                    {
                        processor.Settings.Language = "deu";
                        processor.Settings.Performance = Performance.Slow;

                        // Bitmap bitmap = new Bitmap(DataPathBase+"image.TIF");
                        processor.PerformOCR(loadedDocument, tessdata);
                    }

I used somehow your mentioned code:

Attachment: result_c9086364.7z

SK Surya Kumar Syncfusion Team September 27, 2017 01:36 PM UTC

Hi Jjads,

We have analyzed the error log which you have given in the last update, we suspect that the error may be due to tesseract data which is used for OCR process or due to administrator permission for application.

Please follow below mentioned steps in order to fix the issue.

1.Try using the tesseract data that can be downloaded from below link with the application:

https://github.com/tesseract-ocr/tessdata/raw/3.04.00/deu.traineddata

2. Try running the Visual studio application in administrator mode (“Run as administrator”).

Please let us know if the following steps fixed the issue.

Regards,

Surya Kumar

JJ jjads September 27, 2017 03:51 PM UTC

The file you gave me worked, the other file i download did not work ... https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

SK Surya Kumar Syncfusion Team September 28, 2017 01:39 PM UTC

Hi Jjads,

Since our OCRProcessor uses Tesseract OCR version 3.0.2, we recommend using the tesseract data files for version 3.0.2, all the different language tesseract data under this version can be found in below link:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302

Please let us know if you need any further information in this.

Regards,

Surya Kumar

5 Replies
2 Participants
Want to subscribe?
SIGN IN
Created by
JJ jjads
Platform
ASP.NET MVC
Control
PDF
Created On
Sep 24, 2017 10:32 AM UTC
Last Activity On
Sep 28, 2017 01:39 PM UTC

Viewer Component

.NET PDF Processing Library

Conversions

Editor Component

.NET Word Processing Library

Conversions

Editor Component

.NET Excel Processing Library

Conversions

.NET PowerPoint Processing Library

Conversions

OCR, Fileformats: ocr crashes after setting a different language and the new "deu.traineddata"

Enterprise Solutions

Free Products

Viewer Component

.NET PDF Processing Library

Conversions

Editor Component

.NET Word Processing Library

Conversions

Editor Component

.NET Excel Processing Library

Conversions

.NET PowerPoint Processing Library

Conversions

Learning

Resources

Support

OCR, Fileformats: ocr crashes after setting a different language and the new "deu.traineddata"