We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
Unfortunately, activation email could not send to your email. Please try again.
Syncfusion Feedback

How to perform OCR for a PDF document in Azure environment

Platform: ASP.NET Web Forms |
Control: PDF
Tags: ocr, azure

 

 

Step 1:

 

Create an Azure website project and refer the following assemblies in it:

  1. Syncfusion.Compression.Base.dll
  2. Syncfusion.Pdf.Base.dll
  3. Syncfusion.OCRProcessor.Base.dll

Step 2:

 Add the Tesseract binaries and Tesseract data to the created project in separate folders as embedded resources.

The below screenshot shows the Tesseract binaries and Tesseract data added as separate files inside App_Data folder of the project.

 

Tesseract binaries and data

 

Once the files are added, rebuild the project, then the added files can be found inside the “bin” folder of the project.

Step 3:

Now refer the Tesseract binaries and Tesseract data from the “bin” folder as shown in code snippet below.

 

C# :

  using (OCRProcessor.OCRProcessor processor = new OCRProcessor.OCRProcessor(Server.MapPath("~/bin/App_Data/Tesseract_Binaries/")))
 
{
 
          //Load a PDF document
 
          Stream fileStream =   File.OpenRead(Server.MapPath("~/bin/App_Data/input.pdf"));
                PdfLoadedDocument lDoc = new PdfLoadedDocument(fileStream);
 
        //Set OCR language and perform OCR
 
          processor.Settings.Language = "eng";
 
          processor.PerformOCR(lDoc, Server.MapPath("~/bin/App_Data/Tesseract_Data/"));
 
          //Save and close the document
          lDoc.Save("Output.pdf", this.Response, HttpReadType.Save);
 
          lDoc.Close(true);
 
}
 
 

 

 

VB:

Using processor As New OCRProcessor.OCRProcessor(Server.MapPath("~/bin/App_Data/Tesseract_Binaries/"))
 
 
            'Load a PDF document
 
            Dim fileStream As Stream = File.OpenRead(Server.MapPath("~/bin/App_Data/input.pdf"))
            Dim lDoc As New PdfLoadedDocument(fileStream)
 
            'Set OCR language and perform OCR
 
            processor.Settings.Language = "eng"
 
            processor.PerformOCR(lDoc, Server.MapPath("~/bin/App_Data/Tesseract_Data/"))
 
            'Save and close the document
            lDoc.Save("Output.pdf", Me.Response, HttpReadType.Save)
 
 
            lDoc.Close(True)
End Using

 

 

Now when this project is published in Azure it will directly refer the Tesseract binaries and Tesseract data from” bin” folder and OCR process can be performed with this code snippet.

2X faster development

The ultimate ASP.NET Web Forms UI toolkit to boost your development speed.
ADD COMMENT
You must log in to leave a comment

Please sign in to access our KB

This page will automatically be redirected to the sign-in page in 10 seconds.

Up arrow icon

Warning Icon You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.Close Icon

Live Chat Icon For mobile