)
We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. (Last updated on: June 24, 2019).
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback

How to get the page of the OCR'ed text?

Platform: WinForms |
Control: PDF |
Published Date: January 19, 2016 |
Last Revised Date: January 24, 2020
Tags: pdf, ocr

The OCR process can be performed for individual pages of the PDF document to acquire text for each page separately. Please find the code example and sample below for the same.

C#:

string resulttext = string.Empty;
string out_filename = @"..\..\Data\result.txt";
//Load the existing PDF document.
PdfLoadedDocument lDoc = new PdfLoadedDocument(@"..\..\Data\Region.pdf");
for (int i = 0; i < lDoc.Pages.Count; i++)
{
// Initialize the OCR processor
using (OCRProcessor processor = new OCRProcessor(@"..\..\Tesseract binaries\"))
{
//Set the performance.
processor.Settings.Performance = Performance.Slow;
resulttext += " \n" + "page no " + i.ToString() + "\n";
//Process OCR by providing loaded PDF document page by page.
resulttext += processor.PerformOCR(lDoc, i, i, @"..\..\Tessdata\");
}
}
//save the OCRed text with page number
File.WriteAllText(out_filename, resulttext);
//close the document
lDoc.Close(true);

 

Sample Link:

http://www.syncfusion.com/downloads/support/directtrac/147065/ze/OCRPageByPage-465922787

 

2X faster development

The ultimate WinForms UI toolkit to boost your development speed.
ADD COMMENT
You must log in to leave a comment

Please sign in to access our KB

This page will automatically be redirected to the sign-in page in 10 seconds.

Up arrow icon

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon

Live Chat Icon For mobile
Live Chat Icon