How to Convert Scanned Image to Searchable PDF by Processing OCR in WinForms

3 mins read

The Syncfusion .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract OCR engine.
Using this library, a scanned image is converted into a searchable and selectable PDF document in C# and VB.NET.

Steps to convert scanned image to searchable PDF (OCR) programmatically:

Create a new C# console application project.
Install the Syncfusion.Pdf.OCR.WinForms NuGet packages as a reference to your .NET Framework application from NuGet.org.
Include the following namespace in the Program.cs file.

using Syncfusion.Pdf;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;
using Syncfusion.OCRProcessor;
using System.IO;

VB.NET

Imports Syncfusion.Pdf
Imports Syncfusion.Pdf.Graphics
Imports Syncfusion.Pdf.Parsing
Imports Syncfusion.OCRProcessor
Imports System.IO

Use the following code sample to convert scanned images to searchable PDFs in the Program.cs file.

//Create a new PDF document.
PdfDocument document = new PdfDocument();
//Add a page to the document.
PdfPage page = document.Pages.Add();
//Create PDF graphics for a page.
PdfGraphics graphics = page.Graphics;
//Load the image from the disk.
PdfBitmap image = new PdfBitmap("Input.jpg");
//Draw the image.
graphics.DrawImage(image, 0, 0,page.GetClientSize().Width,page.GetClientSize().Height);
//Save the document into the stream.
MemoryStream stream = new MemoryStream();
document.Save(stream);
//Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{
    //Load a PDF document.
    PdfLoadedDocument lDoc = new PdfLoadedDocument(stream);
    //Set OCR language to process.
    processor.Settings.Language = Languages.English;
    //Process OCR by providing the PDF document.
    processor.PerformOCR(lDoc);
    //Save the OCR processed PDF document on the disk.
    lDoc.Save("OCR.pdf");
    //Close the document.
    lDoc.Close(true);
}
//This will open the PDF file so, the result will be seen in the default PDF viewer.
Process.Start("OCR.pdf");

VB.NET

'Create a new PDF document.
Dim document As New PdfDocument()
'Add a page to the document.
Dim page As PdfPage = document.Pages.Add()
'Create PDF graphics for a page.
Dim graphics As PdfGraphics = page.Graphics
'Load the image from the disk.
Dim image As New PdfBitmap("Input.jpg")
'Draw the image.
graphics.DrawImage(image, 0, 0, page.GetClientSize().Width, page.GetClientSize().Height)
'Save the document into the stream.
Dim stream As New MemoryStream()
document.Save(stream)
'Initialize the OCR processor.
Using processor As New OCRProcessor()
    'Load a PDF document.
    Dim lDoc As New PdfLoadedDocument(stream)
    'Set OCR language to process.
    processor.Settings.Language = Languages.English
    'Process OCR by providing the PDF document.
    processor.PerformOCR(lDoc)
    'Save the OCR processed PDF document in the disk.
    lDoc.Save("OCR.pdf")
    'Close the document.
    lDoc.Close(True)
End Using
'This will open the PDF file so, the result will be seen in the default PDF viewer.
Process.Start("OCR.pdf")

A complete working sample can be downloaded from the OCRSample.Zip.

By executing the program, you will get the PDF document as follows.

Take a moment to peruse the documentation, where you will find other options like performing OCR on an image, region of the document, rotated page, and large PDF documents with code examples.

Refer here to explore the rich set of Syncfusion Essential PDF features.

Note: Starting with v16.2.0.x, if you reference Syncfusion assemblies from the trial setup or the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering the Syncfusion license key in your application to use the components without a trail message.

Did you find this information helpful?

Yes

Comments

How to Convert Scanned Image to Searchable PDF by Processing OCR in WinForms

Steps to convert scanned image to searchable PDF (OCR) programmatically:

Access denied