OCR from a PDF document

1 Reply
2 Participants

Created by
JS Josef Szeliga

Platform
WinForms

Platform
WinForms

Control
PDF

Created On
Feb 26, 2016 12:23 AM UTC

Last Activity On
Feb 26, 2016 11:49 AM UTC

Want to subscribe?
SIGN IN

Firstly I 'd like to know if I have posted this thread to the correct location.

Since my platform is a .Net Console application and this option does not exist in your Platforms list when posting a new thread ?

Secondly, I'm using your OCR library , which is not a control so what control should I select again in when starting a new thread posting ?

I've successfully extracted text from a pdf image using your OCRProcessor.

But 3 issues remain;

The OCR has placed 4 non-printable characters i.e. "\r\n\r\n" in place of every space character, can this be prevented and only an "\r\n" placed at the end actual new lines?
Where a table exists the ocr does not return cell delimiter e.g. tab character, so it's not possible to know which cell the text was in , can this be achieved through some kind of property setting?
The application will run unattended on a server. Is it possible to change the standard output of the OCR progress which is displayed in a command windows and sent to a file instead ?

1 Reply

CM Chinnu Muniyappan Syncfusion Team February 26, 2016 11:49 AM UTC

Hi Josef,

Thank you for contacting Syncfusion support.

We are not able to reproduce the reported issue, we have created a simple sample which we tried to reproduce the issue, could you please modified the sample or provide an input document it would be helpful for us to investigate further. And also we have attached the screenshot for your reference, please refer the below screen shot and sample.

Since my platform is a .Net Console application and this option does not exist in your Platforms list when posting a new thread ?	You can post your .Net Console application related queries under the “General Discussion”. Please find the screenshot below.
1. The OCR has placed 4 non-printable characters i.e. "\r\n\r\n" in place of every space character, can this be prevented and only an "\r\n" placed at the end actual new lines?	We can get only “\r\n” at the end of new line, please refer the below screen shot
2. Where a table exists the ocr does not return cell delimiter e.g. tab character, so it's not possible to know which cell the text was in , can this be achieved through some kind of property setting?	At default the characters are preserved as in the table format, please check the below screenshot.
3. The application will run unattended on a server. Is it possible to change the standard output of the OCR progress which is displayed in a command windows and sent to a file instead ?	Yes, it is possible. you can directly write the ocr’ed results to command window. string text = processor.PerformOCR(new System.Drawing.Bitmap(Image.FromFile("../../ocrImg2.png")), "../../Tessdata/"); Console.WriteLine(text);

Sample link:
https://www.syncfusion.com/downloads/support/forum/123210/ze/OcrProcess321102647

Please let us know if you have any concern.

Regards,
Chinnu

1 Reply
2 Participants
Want to subscribe?
SIGN IN
Created by
JS Josef Szeliga
Platform
WinForms
Control
PDF
Created On
Feb 26, 2016 12:23 AM UTC
Last Activity On
Feb 26, 2016 11:49 AM UTC

Viewer Component

.NET PDF Processing Library

Conversions

Editor Component

.NET Word Processing Library

Conversions

Editor Component

.NET Excel Processing Library

Conversions

.NET PowerPoint Processing Library

Conversions

OCR from a PDF document

Enterprise Solutions

Free Products

Viewer Component

.NET PDF Processing Library

Conversions

Editor Component

.NET Word Processing Library

Conversions

Editor Component

.NET Excel Processing Library

Conversions

.NET PowerPoint Processing Library

Conversions

Learning

Resources

Support

OCR from a PDF document