We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy.
Unfortunately, activation email could not send to your email. Please try again.

OCR from a PDF document

Thread ID:

Created:

Updated:

Platform:

Replies:

123210 Feb 25,2016 07:23 PM Feb 26,2016 06:49 AM Windows Forms 1
loading
Tags: PDF
Josef Szeliga
Asked On February 25, 2016 07:23 PM

Firstly I 'd like to know if I have posted this thread to the correct location.
Since my platform is a .Net Console application and this option does not exist in your Platforms list when posting a new thread ?
Secondly, I'm using your OCR library , which is not a control so what control should I select again in when starting a new thread posting ?

I've successfully extracted text from a pdf image using your OCRProcessor.

But 3 issues remain;
  1. The OCR has placed  4 non-printable characters i.e. "\r\n\r\n" in place of every space character, can this be prevented  and only an "\r\n" placed at the end actual new lines?
  2. Where a table exists the ocr does not return cell delimiter e.g. tab character, so it's not possible to know which cell the text was in , can this be achieved through some kind of property setting?
  3. The application will run unattended on a server.  Is it possible to change the standard output of the OCR progress which is displayed in a command windows and sent to  a file  instead ?

Chinnu Muniyappan [Syncfusion]
Replied On February 26, 2016 06:49 AM

Hi Josef,

Thank you for contacting Syncfusion support.

We are not able to reproduce the reported issue, we have created a simple sample which we tried to reproduce the issue, could you please modified the sample or provide an input document it would be helpful for us to investigate further. And also we have attached the screenshot for your reference, please refer the below screen shot and sample.



Since my platform is a .Net Console application and this option does not exist in your Platforms list when posting a new thread ?

You can post your .Net Console application related queries under the “General Discussion”. Please find the screenshot below.



1.     The OCR has placed  4 non-printable characters i.e. "\r\n\r\n" in place of every space character, can this be prevented  and only an "\r\n" placed at the end actual new lines?

We can get only “\r\n” at the end of new line, please refer the below screen shot


2.     Where a table exists the ocr does not return cell delimiter e.g. tab character, so it's not possible to know which cell the text was in , can this be achieved through some kind of property setting?

At default the characters are preserved as in the table format, please check the below screenshot.


3.     The application will run unattended on a server.  Is it possible to change the standard output of the OCR progress which is displayed in a command windows and sent to  a file  instead ?

Yes, it is possible. you can directly write the ocr’ed results to command window.

string text = processor.PerformOCR(new System.Drawing.Bitmap(Image.FromFile("../../ocrImg2.png")), "../../Tessdata/");


Console.WriteLine(text);



Sample link:
http://www.syncfusion.com/downloads/support/forum/123210/ze/OcrProcess321102647



Please let us know if you have any concern.


Regards,
Chinnu

CONFIRMATION

This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.

;