Your account has not been activated. Click here to get a new activation Email.
Unfortunately, activation email could not send to your email. Please try again.

Tesseract OPX

Introduction

Tesseract is an open source Optical Recognition (OCR) Engine, available under the Apache 2.0 license. It can be used directly or (for programmers) using an API to extract typed, handwritten, or printed text from images. Tesseract OPX makes it easy to use Tesseract with Microsoft .NET. Tesseract OPX is also optimized for working with Syncfusion Essential PDF for .NET to be able to process PDF documents with images that contain text. Tesseract OPX, along with Essential PDF, can process the text in images within PDF documents and overlay them with searchable text.

Assemblies Required

To use the OCR feature in your application, you need to add reference to the following set of assemblies:

Assembly Name

Description

Syncfusion.Pdf.Base This assembly contains the core feature for manipulating and saving PDF documents.
Syncfusion.Compression.Base This assembly compresses the internal contents of a PDF document.
Syncfusion.OCRProcessor.Base This assembly contains core feature for OCR the image and PDF document.

The following namespaces should be added in the application:

  • using Syncfusion.OCRProcessor;
  • using Syncfusion.Pdf.Parsing;

Performing OCR on PDF document

You can perform OCR on a PDF document with the help of OCRProcessor Class. Place the SyncfusionTesseract.dll and liblept168.dll assemblies (available in the installed location Installation Location\Syncfusion\Essential Studio «version number\ocrprocessor) in the local system and provide the assembly path to the OCR processor.


Place the Tesseract language data {E.g eng.traineddata} (available in the installed location Installation Location-\Syncfusion\Essential Studio «version number->\ocrprocessor) in the local system and provide a path to the OCR processor


You can also download the language packages from the link below. https://github.com/tesseract-ocr/tessdata

Please refer to the code snippet below.


Performing OCR for a region of the document:


Performing OCR on image

You can perform OCR on an image also. Refer to the below code snippets for a demonstration.


You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.