OCR from a PDF document
Firstly I 'd like to know if I have posted this thread to the correct location.
Since my platform is a .Net Console application and this option does not exist in your Platforms list when posting a new thread ?
Secondly, I'm using your OCR library , which is not a control so what control should I select again in when starting a new thread posting ?
I've successfully extracted text from a pdf image using your OCRProcessor.
But 3 issues remain;
- The OCR has placed 4 non-printable characters i.e. "\r\n\r\n" in place of every space character, can this be prevented and only an "\r\n" placed at the end actual new lines?
- Where a table exists the ocr does not return cell delimiter e.g. tab character, so it's not possible to know which cell the text was in , can this be achieved through some kind of property setting?
- The application will run unattended on a server. Is it possible to change the standard output of the OCR progress which is displayed in a command windows and sent to a file instead ?
SIGN IN To post a reply.
1 Reply
CM
Chinnu Muniyappan
Syncfusion Team
February 26, 2016 11:49 AM UTC
Hi Josef,
Thank you for contacting Syncfusion support.
We are not able to reproduce the reported issue, we have created a simple sample which we tried to reproduce the issue, could you please modified the sample or provide an input document it would be helpful for us to investigate further. And also we have attached the screenshot for your reference, please refer the below screen shot and sample.
Sample link:
https://www.syncfusion.com/downloads/support/forum/123210/ze/OcrProcess321102647
Please let us know if you have any concern.
Regards,
Chinnu
Thank you for contacting Syncfusion support.
We are not able to reproduce the reported issue, we have created a simple sample which we tried to reproduce the issue, could you please modified the sample or provide an input document it would be helpful for us to investigate further. And also we have attached the screenshot for your reference, please refer the below screen shot and sample.
| Since my platform is a .Net Console application and this option does not exist in your Platforms list when posting a new thread ? | You can post your .Net Console application related queries under the “General Discussion”. Please find the screenshot below.
|
| 1. The OCR has placed 4 non-printable characters i.e. "\r\n\r\n" in place of every space character, can this be prevented and only an "\r\n" placed at the end actual new lines? | We can get only “\r\n” at the end of new line, please refer the below screen shot |
| 2. Where a table exists the ocr does not return cell delimiter e.g. tab character, so it's not possible to know which cell the text was in , can this be achieved through some kind of property setting? | At default the characters are preserved as in the table format, please check the below screenshot. |
| 3. The application will run unattended on a server. Is it possible to change the standard output of the OCR progress which is displayed in a command windows and sent to a file instead ? | Yes, it is possible. you can directly write the ocr’ed results to command window. string text = processor.PerformOCR(new System.Drawing.Bitmap(Image.FromFile("../../ocrImg2.png")), "../../Tessdata/");
Console.WriteLine(text); |
Sample link:
https://www.syncfusion.com/downloads/support/forum/123210/ze/OcrProcess321102647
Please let us know if you have any concern.
Regards,
Chinnu
SIGN IN To post a reply.
- 1 Reply
- 2 Participants
-
JS Josef Szeliga
- Feb 26, 2016 12:23 AM UTC
- Feb 26, 2016 11:49 AM UTC