Perform OCR on PDFs in C# with the .NET PDF Library

The Syncfusion^® .NET optical character recognition (OCR) library enables accurate text extraction from scanned PDFs and images. With just a few lines of C# code, it converts raster‑based PDF pages into fully searchable and selectable PDF documents. Developers can export the recognized content as plain text, structured data formats, or searchable PDF files. Powered by the advanced Tesseract OCR engine, the .NET OCR library delivers reliable and high‑quality text recognition for a wide range of document‑processing scenarios.

Watch this video to see how to perform OCR on PDF files using the OCR processing library.

Watch the video

Perform OCR on PDF documents in C#

Learn how to programmatically perform optical character recognition (OCR) on scanned PDF documents in C# using the Syncfusion OCR processing library. This guide demonstrates text extraction from scanned images using Tesseract engine.

Step 1: Create a new C# Console Application project

Begin by creating a new C# Console Application project in Visual Studio or your preferred IDE to implement PDF OCR functionality.

Step 2: Install Syncfusion PDF OCR NuGet package

Install the Syncfusion.PDF.OCR.Net.Core NuGet package in your C# project from NuGet.org. This package provides Tesseract-based OCR capabilities for PDF documents.

Step 3: Add required namespaces for PDF OCR processing

Import the following namespaces in your Program.cs file to access OCR processor classes and PDF parsing methods:

c#
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

Step 4: Initialize the OCR processor

Create an instance of the OCRProcessor class. This processor uses the Tesseract engine to perform optical character recognition on scanned PDF pages.

c#
// Initialize the OCR processor
using (OCRProcessor processor = new OCRProcessor())
{
}

Step 5: Load the scanned PDF document

Use the PdfLoadedDocument class to load your scanned PDF file that contains images or non-searchable text.

c#
// Load the PDF document
using (PdfLoadedDocument pdfLoadedDocument = new PdfLoadedDocument(Path.GetFullPath(@"Data/Input.pdf")))
{
}

Step 6: Configure OCR language settings

Set the OCR language to match the text in your scanned document. This ensures accurate character recognition for the specified language.

c#
// Set OCR language to process
processor.Settings.Language = Languages.English;

Step 7: Perform OCR and save the PDF document

Apply OCR processing to the loaded PDF document using the PerformOCR method to extract text from scanned images and make the content searchable. Save the processed document with searchable text using the Save method.

Run

c#
// Process OCR by providing the PDF document
processor.PerformOCR(pdfLoadedDocument);
// Save the PDF document
pdfLoadedDocument.Save(Path.GetFullPath(@"Output/Output.pdf"));