Optical Character Recognition (OCR)
Optical character Recognition (OCR) is a technology used to convert scanned paper documents in the form of PDF files or images to searchable or editable data. Paper documents such as brochures, invoices, and contracts, are sent via email. This process usually involves a scanner that converts the document to dots of different colors, known as a raster image. To extract the data and repurpose the content of the document, an OCR engine is necessary. The OCR engine detects the characters present in an image, puts those characters into words, and then into sentences to search and edit the content of the document.
The following assemblies are required to use the OCR feature in your application.
Steps to convert scanned image to searchable PDF programmatically:
The dictionary packs for the other languages can be downloaded from the following online location:
Note: You can get the Tesseract binaries SyncfusionTessaract.dll, liblept168.dll, and the language pack (tessdata)— by downloading the OCR processor zip file from Add-On section from the following link.
By executing the program, you will get the PDF document as follows.
Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to to learn about generating and registering Syncfusion license key in your application to use the components without trail message.
|Article ID:||Published Date:||Last Revised Date:||Platform:||Control:|