.NET PDF Examples
Perform OCR on PDFs in C# with the .NET PDF Library
The Syncfusion® .NET optical character recognition (OCR) library enables accurate text extraction from scanned PDFs and images. With just a few lines of C# code, it converts raster‑based PDF pages into fully searchable and selectable PDF documents. Developers can export the recognized content as plain text, structured data formats, or searchable PDF files. Powered by the advanced Tesseract OCR engine, the .NET OCR library delivers reliable and high‑quality text recognition for a wide range of document‑processing scenarios.
Watch this video to see how to perform OCR on PDF files using the OCR processing library.
Perform OCR on PDF documents in C#
Learn how to programmatically perform optical character recognition (OCR) on scanned PDF documents in C# using the Syncfusion OCR processing library. This guide demonstrates text extraction from scanned images using Tesseract engine.
Step 1: Create a new C# Console Application project
Begin by creating a new C# Console Application project in Visual Studio or your preferred IDE to implement PDF OCR functionality.
Step 2: Install Syncfusion PDF OCR NuGet package
Install the Syncfusion.PDF.OCR.Net.Core NuGet package in your C# project from NuGet.org. This package provides Tesseract-based OCR capabilities for PDF documents.
Step 3: Add required namespaces for PDF OCR processing
Import the following namespaces in your Program.cs file to access OCR processor classes and PDF parsing methods:
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;Step 4: Initialize the OCR processor
Create an instance of the OCRProcessor class. This processor uses the Tesseract engine to perform optical character recognition on scanned PDF pages.
// Initialize the OCR processor
using (OCRProcessor processor = new OCRProcessor())
{
}Step 5: Load the scanned PDF document
Use the PdfLoadedDocument class to load your scanned PDF file that contains images or non-searchable text.
// Load the PDF document
using (PdfLoadedDocument pdfLoadedDocument = new PdfLoadedDocument(Path.GetFullPath(@"Data/Input.pdf")))
{
}Step 6: Configure OCR language settings
Set the OCR language to match the text in your scanned document. This ensures accurate character recognition for the specified language.
// Set OCR language to process
processor.Settings.Language = Languages.English;Step 7: Perform OCR and save the PDF document
Apply OCR processing to the loaded PDF document using the PerformOCR method to extract text from scanned images and make the content searchable. Save the processed document with searchable text using the Save method.
// Process OCR by providing the PDF document
processor.PerformOCR(pdfLoadedDocument);
// Save the PDF document
pdfLoadedDocument.Save(Path.GetFullPath(@"Output/Output.pdf"));GitHub project
NuGet installation
Get started quickly by downloading the installer and checking license information on the Downloads page.
Table of contents
Explore these resources for comprehensive guides, knowledge base articles, insightful blogs, and ebooks.
Learning
Technical Support
