We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

OCR from a PDF document

Firstly I 'd like to know if I have posted this thread to the correct location.
Since my platform is a .Net Console application and this option does not exist in your Platforms list when posting a new thread ?
Secondly, I'm using your OCR library , which is not a control so what control should I select again in when starting a new thread posting ?

I've successfully extracted text from a pdf image using your OCRProcessor.

But 3 issues remain;
  1. The OCR has placed  4 non-printable characters i.e. "\r\n\r\n" in place of every space character, can this be prevented  and only an "\r\n" placed at the end actual new lines?
  2. Where a table exists the ocr does not return cell delimiter e.g. tab character, so it's not possible to know which cell the text was in , can this be achieved through some kind of property setting?
  3. The application will run unattended on a server.  Is it possible to change the standard output of the OCR progress which is displayed in a command windows and sent to  a file  instead ?

1 Reply

CM Chinnu Muniyappan Syncfusion Team February 26, 2016 11:49 AM UTC

Hi Josef,

Thank you for contacting Syncfusion support.

We are not able to reproduce the reported issue, we have created a simple sample which we tried to reproduce the issue, could you please modified the sample or provide an input document it would be helpful for us to investigate further. And also we have attached the screenshot for your reference, please refer the below screen shot and sample.



Since my platform is a .Net Console application and this option does not exist in your Platforms list when posting a new thread ?

You can post your .Net Console application related queries under the “General Discussion”. Please find the screenshot below.



1.     The OCR has placed  4 non-printable characters i.e. "\r\n\r\n" in place of every space character, can this be prevented  and only an "\r\n" placed at the end actual new lines?

We can get only “\r\n” at the end of new line, please refer the below screen shot


2.     Where a table exists the ocr does not return cell delimiter e.g. tab character, so it's not possible to know which cell the text was in , can this be achieved through some kind of property setting?

At default the characters are preserved as in the table format, please check the below screenshot.


3.     The application will run unattended on a server.  Is it possible to change the standard output of the OCR progress which is displayed in a command windows and sent to  a file  instead ?

Yes, it is possible. you can directly write the ocr’ed results to command window.

string text = processor.PerformOCR(new System.Drawing.Bitmap(Image.FromFile("../../ocrImg2.png")), "../../Tessdata/");


Console.WriteLine(text);



Sample link:
https://www.syncfusion.com/downloads/support/forum/123210/ze/OcrProcess321102647



Please let us know if you have any concern.


Regards,
Chinnu

Loader.
Up arrow icon